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HAPLOTYPES OF THE FY GENE- 



RELATED APPLICATIONS 

This application claims the benefit of U.S. Provisional Application Serial No. 60/240,275 filed 

5 October 13, 2000. 

FIELD OF THE INVENTION 

This invention relates to variation in genes that encode pharmaceuticaUy-important proteins. 
In particular, this invention provides genetic variants of the human Duffy blood group (FY) gene and 
10 methods for identifying which variant(s) of this gene is/are possessed by an individual. 

BACKGROUND OF THE INVENTION 

Current methods for identifying pharmaceuticals to treat disease often start by identifying, 
cloning, and expressing an important target protein related to the disease. A determination of whether 

1 5 an agonist or antagonist is needed to produce an effect that may benefit a patient with the disease is 
then made. Then, vast numbers of compounds are screened against the target protein to find new 
potential drugs. The desired outcome of this process is a lead compound that is specific for the target, 
thereby reducing the incidence of the undesired side effects usually caused by activity at non-intended 
targets. The lead compound identified in this screening process then undergoes further in vitro and in 

20 vivo testing to determine its absorption, disposition, metabolism and toxicological profiles. Typically, 
this testing involves use of cell lines and animal models with limited, if any, genetic diversity. 

What this approach fails to consider, however, is that natural genetic variability exists between 
individuals in any and every population with respect to pharmaceuticaUy-important proteins, including 
the protein targets of candidate drugs, the enzymes that metabolize these drugs and the proteins whose 

25 activity is modulated by such drug targets. Subtle alteration(s) in the primary nucleotide sequence of a 
gene encoding a pharmaceutically-important protein may be manifested as significant variation in 
expression, structure and/or function of the protein. Such alterations may explain the relatively high 
degree of uncertainty inherent in the treatment of individuals with a drug whose design is based upon a 
single representative example of the target or enzyme(s) involved in metabolizing the drug. For 

30 example, it is well-established that some chugs frequently have lower efficacy in some individuals than 
others, which means such individuals and their physicians must weigh the possible benefit of a larger 
dosage against a greater risk of side effects. Also, there is significant variation in how well people 
metabolize drugs and other exogenous chemicals, resulting in substantial interindividual variation in 
the toxicity and/or efficacy of such exogenous' substances (Evans et al., 1999, Science 286:487-491). 

35 This variability in efficacy or toxicity of a drug in genetically-diverse patients makes many drugs 

ineffective or even dangerous in certain groups of the population, leading to the failure of such drugs in 
clinical trials or their early withdrawal from the market even though they could be highly beneficial for 

1 
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other groups in the population. This problem significantly increases the -time and cost of .drug 
discovery and development, which is a matter of great public concern. 

It is well-recognized by pharmaceutical scientists that considering the impact of the genetic 
variability of pharmaceutically-important proteins in the early phases of drug discovery and 
5 development is likely to reduce the failure rate of candidate and approved drugs (Marshall A 1 997 
Nature Biotech 15:1249-52; Kleyn PW et al. 1998 Science 281: 1820-21; Kola I 1999 Curr Opin 
Biotech 10:589-92; Hill AVS et al. 1999 in Evolution in Health and Disease Stearns SS (Ed.) Oxford 
University Press, New York, pp 62-76; Meyer UA. 1999 in Evolution in Health and Disease Stearns 
SS (Ed.) Oxford University Press, New York, pp 41-49; Kalow W et al. 1999 Clin. Pharm. Therap. 

10 66:445-7; Marshall, E 1999 Science 284:406-7; Judson R et al. 2000 Pharmacogenomics 1:1-12; Roses 
AD 2000 Nature 405:857-65). However, in practice this has been difficult to do, in large part because 
of the time and cost required for discovering the amount of genetic variation that exists in the 
population (Chakravarti A 1998 Nature Genet 19:216-7; Wang DG et al 1998 Science 280:1077-82; 
Chakravarti A 1999 Nat Genet 21:56-60 (suppl); Stephens JC 1999 Mol Diagnosis 4:309-317; Kwok 

15 PY and Gu S 1999 Mol Med. Today 5:538-43; Davidson S 2000 Nature Biotech 18:1 134-5). 

The standard for measuring genetic variation among individuals is the haplotype, which is the 
ordered combination of polymorphisms in the sequence of each form of a gene that exists in the 
population. Because haplotypes represent the variation across each form of a gene, they provide a 
more accurate and reliable measurement of genetic variation than individual polymorphisms. For 

20 example, while specific variations in gene sequences have been associated with a particular phenotype 
such as disease susceptibility (Roses AD supra; Ulbrecht M et al. 2000 Am JRespir Crit Care Med 
161: 469-74) and drug response (Wolfe CR et al. 2000 BMJ 320:987-90; Dahl BS 1997 Acta Psychiatr 
Scand 96 (Suppl 391): 14-21), in many other cases an individual polymorphism may be found in a 
variety of genomic backgrounds, i.e., different haplotypes, and therefore shows no definitive coupling 

25 between the polymorphism and the causative site for the phenotype (Clark AG et al. 1998 Am J Hum 
Genet 63:595-612; Ulbrecht M et al. 2000 supra; Drysdale et al. 2000 PNAS 97:10483-10488). Thus, 
there is an unmet need in the pharmaceutical industry for information on what haplotypes exist in the 
population for pharmaceutically-important genes. Such haplotype information would be useful in 
improving the efficiency and output of several steps in the drug discovery and development process, 

30 including target validation, identifying lead compounds, and early phase clinical trials (Marshall et al., 
supra). 

One pharmaceutically-important gene for the treatment of malaria and inflammatory diseases 
is the Duffy blood group (FY) gene or its encoded product. FY, also known as DARC, is a Duffy 
blood group associated glycoprotein that carries Duffy blood group antigens (Iwamoto et al. Blood 
35 1995 Feb l;85(3):622-6). The Duffy blood group antigens have been characterized by their roles as 
receptors on red blood cells for the malarial parasites and as promiscuous receptors for the chemokine 
superfamily. Malaria parasites, such as Plasmodium vivax, only enter red blood cells when the FY 
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protein is present (Rios and Bianco, Semin Hematol 2000; 37(2); 177-85). The parasite-specific 
binding site, the binding site of chemokines, and the major antigenic domains of the FY protein are 
located in overlapping regions at the extracellular N4erminus of the FY protein (Pogo and Chaudhuri, 
Semin Hematol 2000; 37(2): 1 22-9). Thus, FY has been associated with malaria and may be involved 
in regulation of the level of circulating proinflammatory chemokines (Woolley et al., Transfusion 
2000; 40(8):949-53). 

The Duffy blood group gene is located on chromosome Iq21-q25 and encodes two alternately 
spliced proteins. One spliced form of the mRNA yields a 338 amino acid protein, which was the first 
described form of FY (Figure 3). It was later discovered that alternative splicing of the mRNA gives 
rise to a 336 amino acid form of the protein (not shown) (Iwamoto et al., Blood 1996; 87:378-85). A 
reference sequence for the FY gene is shown in the contiguous lines of Figure 1 (Genaissance 

* 

Reference No. 41 1 801 1 ; SEQ ID NO: 1). Reference sequences for the coding sequence (GenBank 
Accession No. NM_002036.1) and protein are shown in Figures 2 (SEQ ID NO: 2) and 3 (SEQ ID 
NO: 3), respectively. 

Several polymorphisms in the FY gene have been reported. Tournamille et al. (Nat Genet 
1 995; 10(2):224-8) discovered that the molecular basis of the Fy (a-b-) phenotype was due to a 
polymorphism of thymine or cytosine in the promoter region of the FY gene. This polymorphism 
corresponds to nucleotide 3470 in Figure 1, and is herein referred to as PS9. The presence of this 
polymorphism, which is common in people of African origin but rare in other ethnic groups, results in 
the absence of the FY glycoprotein from red cells and, therefore, resistance to P. vivax or malarial 
infection (Tournamille et al., Nat Genet 1995; 10(2):224-8; Daniels, Transfus Clin Biol 1997; 
4(4):383-90). Another polymorphism in the FY gene is that of an adenine or guanine at a position 
corresponding to nucleotide position 4140 in Figure 1, herein referred to as PS 14. This polymorphism 
results in an aspartic acid or glycine amino acid variation (Tournamille et al., Hum Genet 1995; 
95(4) : 407-410). There are two other polymorphisms in the FY gene that cause amino acid variations 
in the FY protein. These polymorphisms include a cytosine or thymine at a position corresponding to 
nucleotide position 4280 (herein referred to as PS 16) and a guanine or adenine at a position 
corresponding to nucleotide position. 4313 (herein referred to as PS 1 7) in Figure 1 (Y azdanbakhsh et 
al. Transfusion 2000 Mar;40(3):3 10-20). These polymorphisms result in an arginine or cysteine amino 
acid variation at a position corresponding to amino acid position 91 (R91C) and an alanine or 
threonine amino acid variation at a position corresponding to amino acid position 102 (A102C) in 
Figure 3, respectively. In the case of the R91C amino acid variation, the polymorphism results in a 
considerable change to the chemical nature of the protein, suggesting that this polymorphism may 
affect antigenic determinants of FY, and therefore, may be of clinical significance (Parasol et al., 
Blood 92: 2237-2243). 

Because of the potential for variation in the FY gene to affect the expression and function of 
the encoded protein, it would be useful to know whether additional polymorphisms exist in the FY 
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gene, as well as how such polymorphisms are combined in different copies of thQ gene. Such 
information could be applied for studying the biological function of FY as well as in identifying drugs 
targeting this protein for the treatment of disorders related to its abnormal expression or function. 

SUMMARY OF THE INVENTION 

Accordingly, the inventors herein have discovered 1 6 novel polymorphic sites in the FY gene. 
These polymorphic sites (PS) correspond to the following nucleotide positions in Figure 1 : 2690 , 
(PS1), 2864 (PS2), 2882 (PS3), 2910 (PS4), 2949 (PS5), 2980 (PS6), 2996 (PS7), 3259 (PS8), 3672 
(PS10), 3707 (PS1 1), 3979 (PS12), 3997 (PS13), 4214 (PS15), 4617 (PS18), 4618 (PS19) and 4992 
(PS20). The polymorphisms at these sites are cytosine or thymine at PS1, guanine or adenine at PS2, 
adenine or guanine at PS3, cytosine or thymine at PS4, cytosine or adenine at PS5, guanine or cytosine 
at PS6, cytosine or thymine at PS7, thymine or cytosine at PS8, cytosine or thymine at PS 10, cytosine 
or thymine at PS1 1, adenine or guanine at PS12, cytosine or thymine at PS13, cytosine or thymine at 
PS15, cytosine or thymine at PS1 8, guanine or adenine at PS19 and cytosine or thymine at PS20. In 
addition, the inventors have determined the identity of the alleles at these sites, as well as at the 
previously identified sites at nucleotide positions 3470 (PS9), 4140 (PS14), 4280 (PS16) and 4313 
(PS 17), in a human reference population of 79 unrelated individuals self-identified as belonging to one 
of four major population groups: African descent, Asian, Caucasian and Hispanic/Latino. From this 
information, the inventors deduced a set of haplotypes and haplotype pairs for PS1-PS20 in the FY 
gene, which are shown below in Tables 4 and 3, respectively. Each of these FY haplotypes constitutes 
a code that defines the variant nucleotides that exist in the human population at this set of polymorphic 
sites in the FY gene. Thus each FY haplotype also represents a naturaUy-occurring isoform (also 
referred to herein as an "isogene") of the FYgene. The frequency of each haplotype and haplotype 
pair within the total reference population and within each of the four major population groups included 
in the reference population was also, determined. 

Thus, in one embodiment, the invention provides a method, composition and kit for 
genotyping the FY gene in an individual. The genotyping method comprises identifying the nucleotide 
pair that is present at one or more polymorphic sites selected from the group consisting of PS1, PS2, 
PS3, PS4, PS5, PS6, PS7, PS8, PS10, PS1 1, PS12, PS13, PS15, PS18, PS19 and PS20 in both copies 
of the FY gene from the individual. A genotyping composition of the invention comprises an 
oligonucleotide probe or primer which is designed to specifically hybridize to a target region 
containing, or adjacent to, one of these novel FY polymorphic sites. A genotyping kit of the invention 
comprises a set of oligonucleotides designed to genotype each of these novel FY polymorphic sites. In 
a preferred embodiment, the genotyping kit comprises a set of oligonucleotides designed to genotype 
each of PS1-PS20. The genotyping method, composition, and kit are useful in determining whether 
an individual has one of the haplotypes in Table 4 below or has one of the haplotype pairs in Table 3 
below. 
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The invention also provides a method for haplotyping the FY gene in an individual. In one 
embodiment, the haplotyping method comprises detemiining, for one copy of the FY gene, the identity 
of the nucleotide at one or more polymorphic sites selected from the group consisting of PS1, PS2, . 
PS3, PS4, PS5, PS6, PS7, PS8, PS10, PS11, PS12, PS13, PS15, PS18, PS19 andPS20. In another 

5 embodiment, the haplotyping method comprises detennining whether one copy of the individual's FY 
gene is defined by one of the FY haplotypes shown in Table 4, below, or a sub-haplotype thereof. In a 
preferred embodiment, the haplotyping method comprises determining whether both copies of the 
individual's FY gene are defined by one of the FY haplotype pairs shown in Table 3 below, or a sub- 
haplotype pair thereof. Establishing the FY haplotype or haplotype pair of an individual is useful for 

10 improving the efficiency and reliability of several steps in the discovery and development of drugs for 
treating diseases associated with FY activity, e.g., malaria and i n flammatory disorders 

For example, the haplotyping method can be used by the pharmaceutical research scientist to 
validate FY as a candidate target for treating a specific condition or disease predicted to be associated 
with FY activity. Determining for a particular population the frequency of one or more of the 

1 5 individual FY haplotypes or haplotype pairs described herein will facilitate a decision on whether to 
pursue FY as a target for treating the specific disease of interest. In particular, if variable FY activity 
is associated with the disease, then one or more FY haplotypes or haplotype pairs will be found at a 
higher frequency in disease cohorts than in appropriately genetically matched controls. Conversely, if 
each of the observed FY haplotypes are of similar frequencies in the disease and control groups, then it 

20 may be inferred that variable FY activity has little, if any, involvement with that disease. In either 

case, the pharmaceutical research scientist can, without a priori knowledge.as to the phenotypic effect 
of any FY haplotype or haplotype pair, apply the information derived from detecting FY haplotypes in 
an individual to decide whether modulating FY activity would be useful in treating the disease. 
The claimed invention is also useful in screening for compounds targeting FY to treat a 

25 specific condition or disease predicted to be associated with FY activity. For example, detecting which 
of the FY haplotypes or haplotype pairs disclosed herein are present in individual members of a 
population with the specific disease of interest enables the pharmaceutical scientist to screen for a 
compound(s) that displays the highest desired agonist or antagonist activity for each of the FY 
isoforms present in the disease population, or for only the most frequent FY isoforms present in the 

30 disease population. Thus, without requiring any a priori knowledge of the phenotypic effect of any 
particular FY haplotype or haplotype pair, the claimed haplotyping method provides the scientist with 
a tool to identify lead compounds that are more likely to show efficacy in clinical trials. 

Haplotyping the FY gene in an individual is also useful in the design of clinical trials of 
candidate drugs for treating a specific condition or disease predicted to be associated with FY activity. 

35 For example, instead of randomly assigning patients with the disease of interest to the treatment or 

control group as is typically done now, determining which of the FY haplotype(s) disclosed herein are 
present in individual patients enables the pharmaceutical scientist to distribute FY haplotypes and/or 



5 
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haplotype pairs evenly to treatment and control groups, thereby reducing the potential for bias in the 
results that could be introduced by a larger frequency of a FY haplotype or haplotype pair that is 
associated with response to the drug being studied in the trial, even if this association was previously 
unknown. Thus, by practicing the claimed invention, the scientist can more confidently rely on the 
5 information learned from the trial, without first determining the phenotypic effect of any FY haplotype 
of haplotype pair. 

In another embodiment, the invention provides a method for identifying an association 
between a trait and a FY genotype, haplotype, or haplotype pair for one or more of the novel 
polymorphic sites described herein. The method comprises comparing the frequency of the FY 

10 genotype, haplotype, or haplotype pair in a population exhibiting the trait with the frequency of the FY 
genotype or haplotype in a reference population. A higher frequency of the FY genotype, haplotype, 
or haplotype pair in the trait population than in the reference population indicates the trait is associated 
with the FY genotype, haplotype, or haplotype pair. In preferred embodiments, the trait is 
susceptibility to a disease, severity of a disease, the staging of a disease or response to a drug. In a 

15 particularly preferred embodiment, the FY haplotype is selected from the haplotypes shown in Table 4, 
or a sub-haplotype thereof. Such methods have applicability in developing diagnostic tests and 
therapeutic treatments for malaria and inflammatory disorders. 

In yet another embodiment, the invention provides an isolated polynucleotide comprising a 
nucleotide sequence which is a polymorphic variant of a reference sequence for the FY gene or a 

20 fragment thereof. The reference sequence comprises the contiguous sequences shown in Figure 1 and 
the polymorphic variant comprises at least one polymorphism selected from the group consisting of 
thymine at PS1, adenine at PS2, guanine at PS3, thymine at PS4, adenine at PS5, cytosine at PS6, 
thymine at PS7, cytosine at PS8, thymine at PS10, thymine at PS1 1, guanine at PS12, thymine at PS13, 
thymine at PS15, thymine at PS18, adenine at PS19 and thymine at PS20. In a preferred embodiment, 

25 the polymorphic variant comprises one or more additional polymorphisms selected from the group 
consisting of cytosine at PS9, guanine at PS14, thymine at PS16 and adenine at PS17. 

A particularly preferred polymorphic variant is an isogene of the FY gene. A FY isogene of 
the invention comprises cytosine or mymine at PS1, guanine or adenine at PS2, adenine or guanine at 
PS3, cytosine or thymine at PS4, cytosine or adenine at PS5, guanine or cytosine at PS6, cytosine or 

30 thymine at PS7, thymine or cytosine at PS8, thymine or cytosine at PS9, cytosine or thymine at PS 1 0, 
cytosine or thymine at PS1 1, adenine or guanine at PS12, cytosine or thymine at PS13, adenine or 
guanine at PS 1 4, cytosine or thymine at PS 1 5, cytosine or thymine at PS 1 6, guanine or adenine at 
PS 17, cytosine or mymine at PS18, guanine or adenine at PS19 and cytosine or thymine at PS20. The 
invention also provides a collection of FY isogenes, referred to herein as a FY genome anthology. 

35 in another embodiment, the invention provides a polynucleotide comprising a polymorphic 

variant of a reference sequence for a FY cDNA or a fragment thereof. The reference sequence 
comprises SEQ ID NO:2 (Fig.2) and the polymorphic cDNA comprises at least one polymorphism 

6 
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selected from the group consisting of thymine at a position corresponding to nucleotide 205, thymine 
at a position corresponding to nucleotide 608, adenine at a position corresponding to nucleotide 609 
and thymine at a position corresponding to nucleotide 983. In a preferred embodiment, the 
polymorphic variant comprises one or more additional polymorphisms selected from the group 
consisting of gu anin e at a position corresponding to nucleotide 131, mymine at a position 
corresponding to nucleotide 27 1 and adenine at a position corresponding to nucleotide 304. A 
particularly preferred polymorphic cDNA variant comprises the coding sequence of a FY isogene 
defined by haplotypes 2-4, 6, 9, 10-12, and 16-18. 

Polynucleotides complementary to these FY genomic and cDNA variants are also provided by 
the invention. It is believed that polymorphic variants of the FY gene will be useful in studying the 
expression and function of FY, and in expressing FY protein for use in screening for candidate drugs 

to treat diseases related to FY activity. 

In other embodiments, the invention provides a recombinant expression vector comprising one 
of the polymorphic genomic and cDNA variants operably linked to expression regulatory elements as 
well as a recombinant host cell transformed or transfected with the expression vector. The 
recombinant vector and host cell may be used to express FY for protein structure analysis and drug 
binding studies. 

In yet another embodiment, the invention provides a polypeptide comprising a polymorphic 
variant of a reference amino acid sequence for the FY protein. The reference amino acid sequence 
comprises SEQ ID NO:3 (Fig.3) and the polymorphic variant comprises at least one variant amino acid 
selected from the group consisting of phenylalanine at a position corresponding to amino acid position 
69, isoleucine at a position corresponding to amino acid position 203, isoleucine at a position 
corresponding to amino acid position 203 and phenylalanine at a position corresponding to amino acid 
position 328. In some embodiments, the polymorphic variant also comprises at least one variant amino 
acid selected from the group consisting of glycine at a position corresponding to amino acid position 

» 

44, cysteine at a position corresponding to amino acid position 91 and threonine at a position 
corresponding to amino acid position 102. A polymorphic variant of FY is useful in studying the 
effect of the variation on the biological activity of FY as well as on the binding affinity of candidate 
drugs targeting FY for the treatment of malaria and inflammatory disorders. 

i 

The present invention also provides antibodies that recognize and bind to the above 
polymorphic FY protein variant. Such antibodies can be utilized in a variety of diagnostic and 
prognostic formats and therapeutic methods. 

The present invention also provides nonhuman transgenic animals comprising one or more of 
the FY polymorphic genomic variants described herein and methods for producing such animals. The 
transgenic animals are useful for studying expression of the FY isogenes in vivo, for in vivo screening 
and testing of drugs targeted against FY protein, and for testing the efficacy of therapeutic agents and 
compounds for in a biological system. 
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The present invention also provides a computer system for storing and displaying 
polymorphism data determined for the FY gene. The computer system comprises a computer 
processing unit; a display; and a database containing the polymorphism data. The polymorphism data 
includes one or more of the following: the polymorphisms, the genotypes, the haplotypes, and the 
5 haplotype pairs identified for the FY gene in a reference population. In a preferred embodiment, the 
computer system is capable of producing a display showing FY haplotypes organized according to 
their evolutionary relationships. 

BRIEF DESCRIPTION OF THE DRAWINGS 

10 Figure 1 illustrates a reference sequence for the FY gene (Genaissance Reference No. 

41 1 801 1 ; contiguous lines), with the start and stop positions of each region of coding sequence 
indicated with a bracket ([ or ]) and the numerical position below the sequence and the polymorphic 
site(s) and polymorphism(s) identified by Applicants in a reference population indicated by the variant 
nucleotide positioned below the polymorphic site in the sequence. SEQ ID NO:l is equivalent to 

15 Figure 1, with the two alternative allelic variants of each polymorphic site indicated by the appropriate 
nucleotide symbol (R= G or A, Y= T or C, M= A or C, K= G or T, S= G or C, and W= A or T; WIPO 
standard ST.25). SEQ ID NO: 84 is a modified version of SEQ ID NO:l that shows the context 
sequence of each polymorphic site, PS1-PS20, in a uniform format to facilitate electronic searching. 
For each polymorphic site, SEQ ED NO: 84 contains a block of 60 bases of the nucleotide sequence 

20 encompassing the centrally-located polymorphic site at the 30 position, followed by 60 bases of 

unspecified sequence to represent that each PS is separated by genomic sequence whose composition is 
defined elsewhere herein. 

Figure 2 illustrates a reference sequence for the FY coding sequence (contiguous lines; SEQ 
ID NO:2)i with the polymorphic site(s) and polymorphism(s) identified by Appli cants in a reference 

25 population indicated by the variant nucleotide positioned below the polymorphic site in the sequence. 

Figure 3 illustrates a reference sequence for the FY protein (contiguous lines; SEQ ID NO:3), 
with the variant amino acid(s) caused by the polymorphism(s) of Figure 2 positioned below the 
polymorphic site in the sequence. 

30 DESCRIPTION OF THE PREFERRED EMBODIMENTS 

The present invention is based on the discovery of novel variants of the FY gene. As 
described in more detail below, the inventors herein discovered 23 isogenes of the FY gene by 
characterizing the FY gene found in genomic DNAs isolated from an Index Repository that contains 
immortalized cell lines from one chimpanzee and 93 human individuals. The human individuals 

35 included a reference population of 79 unrelated individuals self-identified as belonging to one of four 
major population groups: Caucasian (21 individuals), African descent (20 individuals), Asian (20 
individuals), or Hispanic/Latino (18 individuals). To the extent possible, the members of this reference 

8 
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population were organized into population subgroups by their self-identified ethnogeographic origin as 
shown in Table 1 below. In addition, the Index Repository contains three unrelated indigenous 
American Indians (one from each of North, Central and South America), one three-generation 
Caucasian family (from the CEPH Utah cohort) and one two-generation African- American family. 



Population Group 


Population Subgroup 


No. of Individuals 


African descent 




20 




Sierra Leone 


1 


Asian 


• 


20 




Burma 


1 




China 


3 




Japan 


6 




Korea 


1 




Philippines 


5 




Vietnam 


4 


Caucasian 




21 




British Isles 


3 




British Isles/Central 


4 




British Isles/Eastern 


1 




Central/Eastern 


1 




Eastern 


3 




Central/Mediterranean 


1 




Mediterranean 


2 




Scandinavian 


2 


Hispanic/Latino 




18 




Caribbean 


8 




Caribbean (Spanish Descent) 


2 




Central American (Spanish Descent) 


1 


• 


Mexican American 


4 




South American (Spanish Descent) 


3 



The FY isogenes present in the human reference population are defined by haplotypes for 20 
polymorphic sites in the FY gene, 1 6 of which are believed to be novel. The FY polymorphic sites 

1 0 identified by the inventors are referred to as PS 1 -PS20 to designate the. order in which they are located 
in the gene (see Table 2 below), with the novel polymorphic sites referred to as PS1, PS2, PS3, PS4, 
PS5, PS6, PS7, PS8, PS10, PS1 1, PS12, PS13, PS15, PS18, PS19 and PS20. Using the genotypes 
identified in the Index Repository for PS1-PS20 and the methodology described in the Examples 
below, the inventors herein also determined the pair of haplotypes for the FY gene present in 

1 5 individual human members of this repository. • The human genotypes and haplotypes found in the 
repository for the FY gene include those shown in Tables 3 and 4, respectively. The polymorphism 
and haplotype data disclosed herein are useful for validating whether FY is a suitable target for drugs 
to treat , screening for such drugs and reducing bias in clinical trials of such drugs. 

In the context of this disclosure, the following terms shall be defined as follows unless 

20 otherwise indicated: 
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Allele - A particular form of a genetic locus, distinguished from other forms by its particular 
nucleotide sequence. 

Candidate Gene - A gene which is hypothesized to be responsible for a disease, condition, or 
the response to a treatment, or to be correlated with one of these. 

Gene - A segment of DNA that contains all the information for the regulated biosynthesis of an 
RNA product, including promoters, exons, introns, and other untranslated regions that control 
expression. 

Genotype — An unphased 5 ' to 3 ' sequence of nucleotide pair(s) found at one or more 
polymorphic sites in a locus on a pair of homologous chromosomes in an individual. As used herein, 
genotype includes a full-genotype and/or a sub-genotype as described below. 

Full-genotype - The unphased 5' to 3' sequence of nucleotide pairs found at all polymorphic 

* 

sites examined herein in a locus on a pair of homologous chromosomes in a single individual. 

Sub-genotype - The unphased 5 ' to 3 ' sequence of nucleotides seen at a subset of the 
polymorphic sites examined herein in a locus on a pair of homologous chromosomes in a single 
individual. 

Genotyping - A process for determining a genotype of an individual. 

Haplotype — A 5' to 3' sequence of nucleotides found at one or more polymorphic sites in a 
locus on a single chromosome from a single individual. As used herein, haplotype includes a full- 
haplotype and/or a sub-haplotype as described below. 

FuH-haplotype — The 5' to 3' sequence of nucleotides found at all polymorphic sites examined 
herein in a locus on a single chromosome from a single individual. 

Sub-haplotype - The 5 ' to 3 ' sequence of nucleotides seen at a subset of the polymorphic sites 
examined herein in a locus on a single chromosome from a single individual. 

Haplotype pair — The two haplotypes found for a locus in a single individual. 

Haplotyping - A process for determining one or more haplotypes in an individual and includes 
use of family pedigrees, molecular techniques and/or statistical inference. 

Haplotype data - Information concerning one or more of the following for a specific gene: a 
listing of the haplotype pairs in each individual in a population; a listing of the different haplotypes in 
a population; frequency of each haplotype in that or other populations, and any known associations 
between one or more haplotypes and a trait. 

Isoform - A particular form of a gene, mRNA, cDNA, coding sequence or the protein encoded 
thereby, distinguished from other forms by its particular sequence and/or structure. 

Isogene - One of the isoforms (e.g., alleles) of a gene found in a population. An isogene (or 
allele) contains all of the polymorphisms present in the particular isoform of the gene. 

Isolated - As applied to a biological molecule such as RNA, DNA, oligonucleotide, or protein, 
isolated means the molecule is substantially free of other biological molecules such as nucleic acids, 
proteins, lipids, carbohydrates, or other material such as cellular debris and growth media. Generally, 
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the term "isolated" is not intended to refer to a complete absence of such material or to absence of 
water, buffers, or salts, unless they are present in amounts that substantially interfere with the methods 
of the present invention. 

Locus - A location on a chromosome or DNA molecule corresponding to a gene or a physical 
or phenotypic feature, where physical features include polymorphic sites. 

Naturally-occurring - A term used to designate that the object it is applied to, e.g., naturally- 
occurring polynucleotide or polypeptide, can be isolated from a source in nature and which has not 
been intentionally modified by man. 

Nucleotide pair - The nucleotides found at a polymorphic site on the two copies of a 
chromosome from an individual. 

Phased - As applied to a sequence of nucleotide pairs for two or more polymorphic sites in a 
locus, phased means the combination of nucleotides present at those polymorphic sites on a single 
copy of the locus is known. 

Polymorphic site (PS) - A position on a chromosome or DNA molecule at which at least two 
alternative sequences are found in a population. 

Polymorphic variant (variant)- A gene, mRNA, cDNA, polypeptide, protein or peptide 
whose nucleotide or amino acid sequence varies from a reference sequence due to the presence of a 
polymorphism in the gene. 

Polymorphism - The sequence variation observed in an individual at a polymorphic site. 
Polymorphisms include nucleotide substitutions, insertions, deletions and microsatellites and may, but 
need not, result in detectable differences in gene expression or. protein function. 

Polymorphism data - Information concerning one or more of the following for a specific 
gene: location of polymorphic sites; sequence variation at those sites; frequency of polymorphisms in 
one or more populations; the different genotypes and/or haplotypes determined for the gene; frequency 
of one or more of these genotypes and/or haplotypes in one or more populations; any known 
association(s) between a trait and a genotype or a haplotype for the gene. 

Polymorphism Database - A collection of polymorphism data arranged in a systematic or 
methodical way and capable of being individually accessed by electronic or other means. 

Polynucleotide - A nucleic acid molecule comprised of single-stranded RNA or DNA or 
comprised of complementary, double-stranded DNA. 

Population Group - A group of individuals sharing a common ethnogeographic origin. 
Reference Population - A group of subjects or individuals who are predicted to be 
representative of the genetic variation found in the general population. Typically, the reference, 
population represents the genetic variation in the population at a certainty level of at least 85%, 
preferably at least 90%, more preferably at least 95% and even more preferably at least 99%. 

Single Nucleotide Polymorphism (SNP) - Typically, the specific pair of nucleotides observed 
at a single polymorphic site. In rare cases, three or four nucleotides may be found. 
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Subject — A human individual whose genotypes or haplotypes or response to treatment or 
disease state are to be determined. 

Treatment - A stimulus administered internally or externally to a subject. 

Unphased — As applied to a sequence of nucleotide pairs for two or more polymorphic sites in 
a locus, unphased means the combination of nucleotides present at those polymorphic sites on a single 
copy of the locus is not known. 

As discussed above, information on the identity of genotypes and haplotypes for the FY gene 
of any particular individual as well as the frequency of such genotypes and haplotypes in any particular 
population of individuals is useful for a variety of drug discovery and development applications. Thus, 
the invention also provides compositions and methods for detecting the novel FY polymorphisms, 
haplotypes and haplotype pairs identified herein. 

The compositions comprise at least one oligonucleotide for detecting the variant nucleotide or 
nucleotide pair located at a novel FY polymorphic site in one copy or two copies of the FY gene. Such 
oligonucleotides are referred to herein as FY haplotyping oligonucleotides or genotyping . 
oligonucleotides, respectively, and collectively as FY oligonucleotides. In one embodiment, a FY 
haplotyping or genotyping oligonucleotide is a probe or primer capable of hybridizing to a target . 
region that contains, or that is located close to, one of the novel polymorphic sites described herein. 

As used herein, the term "oligonucleotide" refers to a polynucleotide molecule having less 
than about 100 nucleotides. A preferred oligonucleotide of the invention is 10 to 35 nucleotides long. 
More preferably, the oligonucleotide is between 15 and 30, and most preferably, between 20 and 25 
nucleotides in length. The exact length of the oligonucleotide will depend on many factors that are 
routinely considered and practiced by the skilled artisan. The oligonucleotide may be comprised of 
any phosphorylation state of ribonucleotides, deoxyribonucleotides, and acyclic nucleotide derivatives, 
and other functionally equivalent derivatives. Alternatively, oligonucleotides may have a phosphate- 
free backbone, which may be comprised of linkages such as carboxymethyl, acetamidate, carbamate, 
polyamide (peptide nucleic acid (PNA)) and the like (Varma, R. in Molecular Biology and 
Biotechnology, A Comprehensive Desk Reference, Ed. R. Meyers, VCH Publishers, Inc. (1995), pages 
617-620). Oligonucleotides of the invention may be prepared by chemical synthesis using any suitable 
methodology known in the art, or may be derived from a biological sample, for example, by restriction 
digestion. The oligonucleotides may be labeled, according to any technique known in the art, 
including use of radiolabels, fluorescent labels, enzymatic labels, proteins, haptens, antibodies, 
sequence tags and the like. 

Haplotyping or genotyping oligonucleotides of the invention must be capable of specifically 
hybridizing to a target region of a FY polynucleotide. Preferably, the target region is located in a FY 
isogene. As used herein, specific hybridization means the oligonucleotide forms an anti-parallel 
double-stranded structure with the target region under certain hybridizing conditions, while failing to 
form such a structure when incubated with another region in the FY polynucleotide or with a non-FY 
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polynucleotide under the same hybridizing conditions. Preferably, the oUgonucleotide specifically 
hybridizes to the target region under conventional high stringency conditions. The skilled artisan can 
readily design and test oligonucleotide probes and primers suitable for detecting polymorphisms in the 
FY gene using the polymorphism information provided herein in conjunction with the known sequence 

5 information for the FY gene and routine techniques. 

A nucleic acid molecule such as an oligonucleotide or polynucleotide is said to be a "perfect" 
or "complete" complement of another nucleic acid molecule if every nucleotide of one of the 
molecules is complementary to the nucleotide at the corresponding position, of the other molecule. A 
nucleic acid molecule is "substantially complementary" to another molecule if it hybridizes to that 

10 molecule with sufficient stability to remain in a duplex form under conventional low-stringency 

conditions. Conventional hybridization conditions are described, for example, by Sambrook J. et al., 
in Molecular Cloning, A Laboratory Manual, 2 nd Edition, Cold Spring Harbor Press, Cold Spring 
Harbor, NY (1989) and by Haymes, B.D. et al. in Nucleic Acid Hybridization, A Practical Approach, 
IRL Press, Washington, D.C. (1985). While perfectly complementary oligonucleotides are preferred 

15 for detecting polymorphisms, departures from complete complementarity are contemplated where such 
departures do not prevent the molecule from specifically hybridizing to the target region. For example, 
an oligonucleotide primer may have a non-complementary fragment at its 5' end, with the remainder of 
the primer being complementary to the target region. Alternatively, non-complementary nucleotides 
may be interspersed into the probe or primer as long as the resulting probe or primer is still capable of 

20 specifically hybridizing to the target region. 

Preferred haplotyping or genotyping oligonucleotides of the invention are allele-specific 
oligonucleotides. As used herein, the term allele-specific oligonucleotide (ASO) means an 
oligonucleotide that is able, under sufficiently stringent conditions, to hybridize specifically to one 
. allele of a gene, or other locus, at a target region containing a polymorphic site while not hybridizing to 

25 ? the corresponding region in another allele(s). As understood by the skilled artisan, allele-specificity 
will depend upon a variety of readily optimized stringency conditions, including salt and formamide 
concentrations, as well as temperatures for both the hybridization and washing steps. Examples of 
hybridization and washing conditions typically used for ASO probes are found in Kogan et al., 
"Genetic Prediction of Hemophilia A" in PCR Protocols, A Guide to Methods and Applications, 

30 Academic Press, 1990 and Ruano et al., 87 Proc. Natl Acad. Set USA 6296-6300, 1990. Typically, an 
ASO will be perfectly complementary to one allele while containing a single mismatch for another 
allele. 

Allele-specific oligonucleotides of the invention include ASO probes and ASO primers. ASO 
probes which usually provide good discrimination between different alleles are those in which a central 
35 position of the oligonucleotide probe aligns with the polymorphic site in the target region (e.g., 

approximately the 7 th or 8 th position in a 15mer, the 8 th or 9 th position in a 16mer, and the 10 th or 1 1 th 
position in a 20mer). An ASO primer of the invention has a 3' terminal nucleotide, or preferably a 3 ' 
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penultimate nucleotide, that is complementary to only one nucleotide of a particular SNP, thereby 
acting as a primer for polymerase-mediated extension only if the allele containing that nucleotide is 
present. ASO probes and primers hybridizing to either the coding or noncoding strand are 
contemplated by the invention. ASO probes and primers listed below use the appropriate nucleotide 
symbol (R= G or A, Y= T or C, M= A or C, K= G or T, S= G or C, and W= A or T; WIPO standard 
ST.25) at the position of the polymorphic site to represent that the ASO contains either of the two 
alternative allelic variants observed at that polymorphic site. 

A preferred ASO probe for detecting FY gene polymorphisms comprises a nucleotide 
sequence, listed 5' to 3', selected from the group consisting of: 



TGTCAGAYCATGTAT ( SEQ 

GACACCCRCCAAGCC ( SEQ 

CACAT ACRGATAT GT ( SEQ 

CAGCAAAYGTACACA ( SEQ 

ACGCCCAMGTGCACA ( SEQ 

CAGAGTTSACCACCA ( SEQ 

CACCTTTYTCCCAAA ( SEQ 

TCTCCCTYTCCACTT ( SEQ 

CCCTTCCYGCTTTTT ( SEQ 

TTTCCTTYTCTCCTT ( SEQ 

CCCCTCARTTCCCAG ( SEQ 

ACTCTTCYGGTGTAA (SEQ 

CTTCATCYTCACCAG ( SEQ 

TACAGCAYGGAGCTG (SEQ 

ACAGCACRGAGCTGA ( SEQ 

GGATGGT YTTCTCAT ( SEQ 
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A preferred ASO primer for detecting FY gene polymorphisms comprises a nucleotide 
sequence, listed 5' to 3', selected from the group consisting of: 



TTAACTTGTCAGAYC 


(SEQ 


ID 


NO: 


20) 


CACCCAGACACCCRC 


(SEQ 


ID 


NO: 


22) 


GCCCCTCACATACRG 


(SEQ 


ID 


NO: 


24) 


GATACACAGCAAAYG 


(SEQ 


ID 


NO: 


.26) 


GAGCTCACGCCCAMG 


(SEQ 


.ID 


NO: 


28) 


TTGGGACAGAGTTSA 


(SEQ 


ID 


NO: 


30) 


CACCACCACCTTTYT 


(SEQ 


ID 


NO: 


:32) 


TCTCAATCTCCCTYT 


(SEQ 


ID 


NO: 


.34) 


TTTTCTCCCTTCCYG 


(SEQ 


ID 


NO: 


,36) 


AGTCTTTTTCCTTYT 


(SEQ 


ID 


NO: 


•38) 


CACCTGCCCCTCART 


(SEQ 


ID 


NO: 


40) 


CAGGAGACTCTTCYG 


(SEQ 


ID 


NO: 


42) 


GCCCTTCTTCATCYT 


(SEQ 


ID 


NO: 


44) 


CTGATATACAGCAYG 


(SEQ 


ID 


NO: 


46) 


TGATATACAGCACRG 


(SEQ 


ID 


NO: 


48) 


CCTGAAGGATGGTYT 


(SEQ 


ID 


NO: 


50) 



AGTGGAATACATGRT 
GTGAGGGGCTTGGYG 
TTGTGCACATATCYG 
GAACTCTGTGTACRT 
GGGGTGTGTGCACKT 
AGGTGGTGGTGGTSA 
CATGTGT TTGGGARA 
TTACCGAAGTGGARA 
AGAGGAAAAAAGCRG 
CATAGGAAGGAGARA 
AGTCTCCTGGGAAYT 
TCAGAGTTACACCRG 
AGGACACTGGTGARG 
AGCCTTCAGCTCCRT 
AAGCCTTCAGCTCYG 
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Other oligonucleotides of the invention hybridize to a target region located one to several 
nucleotides downstream of one of the novel polymorphic sites identified herein. Such oligonucleotides 
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are useful in polymerase-mediated primer extension methods for detecting one of the novel 
polymorphisms described herein and therefore such oligonucleotides are referred to herein as "primer- 
extension oligonucleotides". In a preferred embodiment, the 3 'terminus of a primer-extension 
oligonucleotide is a deoxynucleotide complementary to the nucleotide located immediately adjacent to 
5 the polymorphic site. 

A particularly preferred oligonucleotide primer for detecting FY gene polymorphisms by 
primer extension terminates in a nucleotide sequence, listed 5' to 3', selected from the group consisting 
of: 



10 


ACTTGTCAGA 


(SEQ 


ID 


NO: 


52) 


; GGAATAC AT G 


(SEQ 


ID 


NO: 


53) 




CCAGACACCC 


(SEQ 


ID 


NO: 


54) 


; AGGGGCTTGG 


(SEQ 


ID 


NO: 


55) 




CCTCACATAC 


(SEQ 


ID 


NO: 


56) 


; TGCACATATC 


(SEQ 


ID 


NO: 


57) 




ACACAGCAAA 


(SEQ 


ID 


NO: 


58) 


; CTCTGTGTAC 


(SEQ 


ID 


NO: 


59) 




CTCACGCCCA 


(SEQ 


ID 


NO: 


60) 


; GTGTGTGCAC 


(SEQ 


ID 


NO: 


61) 


15 


GGACAGAGTT 


(SEQ 


ID 


NO: 


62) 


; TGGTGGTGGT 


(SEQ 


ID 


NO: 


63) 




CACCACCTTT 


(SEQ 


ID 


NO: 


64) 


; GTGTTTGGGA 


(SEQ 


ID 


NO: 


65) 




CAATCTCCCT 


(SEQ 


ID 


NO: 


66) 


; CCGAAGTGGA 


(SEQ 


ID 


NO: 


67) 




TCTCCCTTCC 


(SEQ 


ID 


NO: 


68) 


; GGAAAAAAGC 


(SEQ 


ID 


NO: 


69) 




CTTTTTCCTT 


(SEQ 


ID 


NO: 


70) 


; AGGAAGGAGA 


(SEQ 


ID 


NO: 


71) 


20 


CTGCCCCTCA 


(SEQ 


ID 


NO: 


72) 


; CTCCTGGGAA 


(SEQ 


ID 


NO: 


73) 




GAGACTCTTC 


(SEQ 


ID 


NO: 


74) 


; GAGTTACACC 


(SEQ 


ID 


NO: 


75) 




CTTCTTCATC 


(SEQ 


ID 


NO: 


76) 


; ACACTGGTGA 


(SEQ 


ID 


NO: 


77) 




ATATACAGCA 


(SEQ 


ID 


NO: 


78) 


; CTTCAGCTCC 


(SEQ 


ID 


NO: 


79) 




TATACAGCAC 


(SEQ 


ID 


NO: 


80) 


; CCTTCAGCTC 


(SEQ 


ID 


NO: 


81) 


25 


GAAGGATGGT 


(SEQ 


ID 


NO: 


82) 


; and CAGAT GAGAA 


(SEQ IC 


> NO 



In some embodiments, a composition contains two or more differently labeled FY 
oligonucleotides for simultaneously probing the identity of nucleotides or nucleotide pairs at two or 
more polymorphic sites. It is also contemplated that primer compositions may contain two or more 

30 sets of allele-specific primer pairs to allow simultaneous targeting and amplification of two or more 
regions containing a polymorphic site. 

FY oligonucleotides of the invention may also be immobilized on or synthesized on a solid 
surface such as a microchip, bead, or glass slide (see, e.g., WO 98/20020 and WO 98/20019). Such 
immobilized oligonucleotides may be used in a variety of polymorphism detection assays, including 

35 but not limited to probe hybridization and polymerase extension assays. Immobilized FY 

oligonucleotides of the invention may comprise an ordered array of oligonucleotides designed to 
rapidly screen a DNA sample for polymorphisms in multiple genes at the same time. 

In another embodiment, the invention provides a kit comprising at least two FY 
oligonucleotides packaged in separate containers. The kit may also contain other components such as 

40 hybridization buffer (where the oligonucleotides are to be used as a probe) packaged in a separate 

container. Alternatively, where the oligonucleotides are to be used to amplify a target region, the kit 
may contain, packaged in separate containers, a polymerase and a reaction buffer optimized for primer 
extension mediated by the polymerase, such as PCR. 
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The above described oligonucleotide compositions and kits are useful in methods for 
genotyping and/or haplotyping the FY gene in an individual. As used herein, the terms "FY genotype*' 
and *TY haplotype" mean the genotype or haplotype contains the nucleotide pair or nucleotide, 
respectively, that is present at one or more of the novel polymorphic sites described herein and may 
optionally also include the nucleotide pair or nucleotide present at one or more additional polymorphic 
sites in the FY gene. The additional polymorphic sites may be currently known polymorphic sites or 
sites that are subsequently discovered. 

One embodiment of a genotyping method of the invention involves isolating from the 
individual a nucleic acid sample comprising the two copies of the FY gene, mRNA transcripts thereof 
or cDNA copies thereof, or a fragment of any of the foregoing, that are present in the individual, and 
detennining the identity of the nucleotide pair at one or more polymorphic sites selected from the 
group consisting of PS1, PS2, PS3, PS4, PS5, PS6, PS7, PS8, PS10, PS11, PS12, PS13, PS15, PS18, 
PS19 and PS20 in the two copies to assign a FY genotype to the individual. As will be readily 
understood by the skilled artisan, the two "copies" of a gene, mRNA or cDNA (or fragment of such 
FY molecules) in an individual may be the same allele or may be different alleles. In a preferred 
embodiment of the method for assigning a FY genotype, the identity of the nucleotide pair at one or 
more of the polymorphic sites selected from the group consisting of PS9, PS14, PS16 and PS17 is also 
determined. In another embodiment, a genotyping method of the invention comprises determining the 
identity of the nucleotide pair at each of PS 1 -PS20. 

Typically, the nucleic acid sample is isolated from a biological sample taken from the 
individual, such as a blood sample or tissue sample. Suitable tissue samples include whole blood, 
semen, saliva, tears, urine, fecal material, sweat, buccal, skin and hair. The nucleic acid sample may 
be comprised of genomic DNA, mRNA, or cDNA and, in the latter two cases, the biological sample 
must be obtained from a tissue in which the FY gene is expressed. Furthermore it will be understood 
by the skilled artisan that mRNA or cDNA preparations would not be used to detect polymorphisms 
located in introns or in 5 ' and 3 ' untranslated regions if not present in the mRNA or cDNA. If a FY 
gene fragment is isolated, it must contain the polymorphic site(s) to be genotyped. 

One embodiment of a haplotyping method of the invention comprises isolating from the 
individual a nucleic acid sample containing only one of the two copies of the FY gene, mRNA or 
cDNA, or a fragment of such FY molecules, that is present in the individual and determining in that 
copy the identity of the nucleotide at one or more polymorphic sites selected from the group consisting 
of PS1, PS2, PS3, PS4, PS5, PS6, PS7, PS8, PS10, PS1 1, PS12, PS13, PS15, PS18, PS19 and PS20 in 
that copy to assign a FY haplotype to the individual. 

The nucleic acid used in the above haplotyping methods of the invention may be isolated using 
any method capable of separating the two copies of the FY gene or fragment such as one of the 
methods described above for preparing FY isogenes, with targeted in vivo cloning being the preferred 
approach. As will be readily appreciated by those skilled in the art, any individual clone will typically 
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.only .provide haplotype information on one of the two FY gene copies present in an individual. If 
haplotype information is desired for the individual's other copy, additional FY clones will usually need 
to be examined. Typically, at least five clones should be examined to have more than a 90% 
probability of haplotyping both copies of the FY gene in an individual. In some cases, however, once 
5 the haplotype for one FY allele is directly determined, the haplotype for the other allele may be 

inferred if the individual has a known genotype for the polymorphic sites of interest or if the haplotype 
frequency or haplotype pair frequency for the individual's population group is known. In some 
embodiments, the FY haplotype is assigned to the individual by also identifying the nucleotide at one 
or more polymorphic sites selected from the group consisting of PS9, PS 14, PS 16 and PS 17. In a 
10 particularly preferred embodiment, the nucleotide at each of PS1-PS20 is identified. 

r 

In another embodiment, the haplotyping method comprises determining whether an individual 
has one or more of the FY haplotypes shown in Table 4. This can be accomplished by identifying, for 
one or both copies of the individual's FY gene, the phased sequence of nucleotides present at each of 
PS1-PS20. This identifying step does not necessarily require that each of PS1-PS20 be directly 

15 examined. Typically only a subset of PS1-PS20 will need to be directly examined to assign to an 

individual one or more of the haplotypes shown in Table 4. This is because at least one polymorphic 
site in a gene is frequently in strong linkage disequilibrium with one or more other polymorphic sites 
in that gene (Drysdale, CM et al. 2000 PNAS 97:10483-10488; Rieder MJ et al. 1999 Nature Genetics 
22:59-62). Two sites are said to be in linkage disequilibrium if the presence of a particular variant at- 

20 one site enhances the predictability of another variant at the second site (Stephens, JC 1999, MoL 
Diag. 4:309-3 17). Techniques for determining whether any two polymorphic sites are in linkage 
disequilibrium are well-known in the art (Weir B.S. 1996 Genetic Data Analysis II, Sinauer 
Associates, Inc. Publishers, Sunderland, MA). 

In another embodiment of a haplotyping method of the invention, a FY haplotype pair is 

25 determined for an individual by identifying the phased sequence of nucleotides at one or more 

polymorphic sites selected from the group consisting of PSi, PS2, PS3, PS4, PS5, PS6, PS7, PS8, 
PS10, PSI 1, PS12, PS13, PS15, PS18, PS19 and PS20 in each copy of the FY gene that is present in 
the individual. In a particularly preferred embodiment, the haplotyping method comprises identifying 
the phased sequence of nucleotides at each of PS 1 -PS20 in each copy of the FY gene. 

30 When haplotyping both copies of the gene, the identifying step is preferably performed with 

each copy of the gene being placed in separate containers. However, it is also envisioned that if the 
two copies are labeled with different tags, or are otherwise separately distinguishable or identifiable, it 
could be possible in some cases to perform the method in the same container. For example, if first and 
second copies of the gene are labeled with different first and second fluorescent dyes, respectively, and 

35 an allele-speciflc oligonucleotide labeled with yet a third different fluorescent dye is used to assay the 
polymorphic site(s), then detecting a combination of the first and third dyes would identify the 
polymorphism in the first gene copy while detecting a combination of the second and third dyes would 
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identify the polymorphism in the second gene copy. 

In both the genotyping and haplotyping methods, the identity of a nucleotide (or nucleotide 
pair) at a polymorphic site(s) may be determined by amplifying a target region(s) containing the 
polymorphic site(s) directly from one or both copies of the FY gene, or a fragment thereof, and the 
5 sequence of the amplified region(s) determined by conventional methods. It will be readily 

appreciated by the skilled artisan that only one nucleotide will be detected at a polymorphic site in 
individuals who are homozygous at that site, while two different nucleotides will be detected if the 
individual is hetero2ygous for that site. The polymorphism may be identified directly, known as 
positive-type identification, or by inference, referred to as negative-type identification. For example, 
10 where a SNP is known to be guanine and cytosine in a reference population, a site may be positively 
determined to be . either guanine or cytosine for an individual homozygous at that site, or both guanine 

and cytosine, if the individual is heterozygous at that site. Alternatively, the site may be negatively 

v • 
determined to be not guanine (and thus cytosine/cytosine) or not cytosine (and thus guanine/guanine) 

The target region(s) may be amplified using any oligonucleotide-directed amplification 

15 method, including but not limited to polymerase chain reaction (PCR) (U.S. Patent No. 4,965,188), 
ligase chain reaction (LCR) (Barany et al., Proc. Natl Acad. Sci. USA 88:189-193, 1991; 
WO90/01069), and oligonucleotide ligation assay (OLA) (Landegren et al., Science 241:1077-1080, 
1988). Other known nucleic acid amplification procedures may be used to amplify the target region 
including transcription-based amplification systems (U.S. Patent No. 5,130,238; EP 329,822; U.S. 

20 Patent No. 5, 169,766, WO89/06700) and isothermal methods (Walker et al., Proc. Natl Acad. Sci. 
USA 89:392-396, 1992). 

A polymorphism in the target region may also be assayed before or after amplification using 
one of several hybridization-based methods known in the art. Typically, allele-specific 
oligonucleotides are utilized in performing such methods. The allele-specific oligonucleotides may be 

25 used as differently labeled probe pairs, with one member of the pair showing a perfect match to one 
variant of a target sequence and the other member showing a perfect match to a different variant. In 
some embodiments, more than one polymorphic site may be detected at once using a set of allele- 
specific oligonucleotides or oligonucleotide pairs. Preferably, the members of the set have melting 
temperatures within 5°C, and more preferably within 2°C, of each other when hybridizing to each of 

30 the polymorphic sites being detected. 

Hybridization of an allele-specific oligonucleotide to a target polynucleotide may be performed 
with both entities in solution, or such hybridization may be performed when either the oligonucleotide 
or the target polynucleotide is covalently or noncovalently affixed to a solid support. Attachment may 
be mediated, for example, by antibody-antigen interactions, poly-L-Lys, streptavidin or avidin-biotin, 

35 salt bridges, hydrophobic interactions, chemical linkages, UV cross-linking baking, etc. Allele- 
specific oligonucleotides may be synthesized directly on the solid support or attached to the solid 
support subsequent to synthesis. Solid-supports suitable for use in detection methods of the invention 
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include substrates made of silicon, glass, plastic, paper and the like, which may be formed, for 
example, into wells (as in. 96-well plates), slides, sheets, membranes, fibers, chips, dishes, and beads. 
The solid support may be treated, coated or derivatized to facilitate the immobilization of the allele- 
specific oligonucleotide or target nucleic acid. 
5 The genotype or haplotype for the FY gene of an individual may also be determined by 

hybridization of a nucleic acid sample containing one or both copies of the gene, mRNA, cDNA or 
fragment(s) thereof, to nucleic acid arrays and subarrays such as described in WO 95/1 1995. The 

- 

arrays would contain a battery of allele-specific oligonucleotides representing each of the polymorphic 
sites to be included in the genotype or haplotype. 

10 The identity of polymorphisms may also be determined using a mismatch detection technique, 

including but not limited to the RNase protection method using riboprobes (Winter et al., Proc, Natl. 
Acad. Sci. USA 82:7575, 1985; Meyers et al., Science 230:1242, 1985) and proteins which recognize 
nucleotide mismatches, such as the E. coli mutS protein (Modrich, P. Ann. Rev. Genet 25:229-253, 
1991). Alternatively, variant alleles can be identified by single strand conformation polymorphism 

15 (SSCP) analysis (Orita et al., Genomics 5:874-879, 1989; Humphries et al., in Molecular Diagnosis of 
Genetic Diseases, R. Elles, ed., pp. 321-340, 1996) or denaturing gradient gel electrophoresis (DGGE) 
(Wartell et al., Nucl Acids Res. 18:2699-2706, 1990; Sheffield et al., Proc. Natl Acad. Sci. USA 
86:232-236, 1989). 

A polymerase-mediated primer extension method may also be used to identify the 

20 polymorphism(s). Several such methods have been described in the patent and scientific literature and 
include.the "Genetic Bit Analysis" method (W092/15712) and the ligase/polymerase mediated genetic 
bit analysis (U.S. Patent 5,679,524. Related methods are disclosed in WO91/02087, WO90/09455, 
W095/1 7676, U.S. Patent Nos. 5,302,509, and 5,945,283. Extended primers containing a 
polymorphism may be detected by mass spectrometry as described in U.S. Patent No. 5,605,798. 

25 Another primer extension method is allele-specific PCR (Ruano et al., Nucl. Acids Res. 17:8392, 1989; 
Ruano et al, Nucl. Acids Res. 19, 6877-6882, 1991; WO 93/22456; Turki et al., J. Clin. Invest. 
95:1635-1641, 1995). In addition, multiple polymorphic sites may be investigated by simultaneously 
amplifying multiple regions of the nucleic acid using sets of allele-specific primers as described in 
Wallace et al. (WO89/10414). 

30 In addition, the identity of the allele(s) present at any of the novel polymorphic sites described 

herein may be indirectly determined by haplotyping or genotyping another polymorphic site that is in 
linkage disequilibrium wirn the polymorphic site that is of interest. Polymorphic sites in linkage 
disequilibrium with the presently disclosed polymorphic sites may be located in regions of the gene or 
in other genomic regions not examined herein. Detection of the allele(s) present at a polymorphic site 

35 in linkage disequilibrium with the novel polymorphic sites described herein may be performed by, but 
is not limited to, any of the above-mentioned methods for detecting the identity of the allele at a 
polymorphic site. 
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In another aspect of the invention, an individual's FY haplotype pair is predicted from its FY 
genotype using information on haplotype pairs known to exist in a reference population. In its broadest 
embodiment, the haplotyping prediction method comprises identifying a FY genotype for the 
individual at two or more FY polymorphic sites described herein, accessing data cont aining FY 
5 haplotype pairs identified in a reference population, and assigning a haplotype pair to the individual 
that is consistent with the genotype data. In one embodiment, the reference haplotype pairs include the 
FY haplotype pairs shown in Table 3. The FY haplotype pair can be assigned by comparing the 
individual's genotype with the genotypes corresponding to the haplotype pairs known to exist in the 
general population or in a specific population group, and determining which haplotype pair is 

10 consistent with the genotype of the individual. In some embodiments, the comparing step may be 
performed by visual inspection (for example, by consulting Table 3). When the genotype of the . 
individual is consistent with more than one haplotype pair, frequency data (such as that presented in 
Table 6) may be used to determine which of these haplotype pairs is most likely to be present in the 
individual. This determination may also be performed in some embodiments by visual inspection, for 

15 example by consulting Table 6. If a particular FY haplotype pair consistent with the genotype of the 
individual is more frequent in the reference population than others consistent with the genotype, then 
that haplotype pair with the highest frequency is the most likely to be present in the individual. In 
other embodiments, the comparison may be made by a computer-implemented algorithm with the 
genotype of the individual and the reference haplotype data stored in computer-readable formats. For 

20 example, as described in PCT/US01/12831, filed April 18, 2001, one computer-implemented algorithm 
to perform this comparison entails enumerating all possible haplotype pairs which are consistent with 
the genotype, accessing data containing FY haplotype pairs frequency data determined in a reference 
population to determine a probability that the individual has a possible haplotype pair, and analyzing 
the determined probabilities to assign a haplotype pair to the individual. 

25 Generally, the reference population should be composed of randomly-selected individuals 

representing the major ethnogeographic groups of the world. A preferred reference population for use 
in the methods of the present invention comprises an approximately equal number of individuals from 
Caucasian, African-descent, Asian and KQspanic-Latino population groups with the minim um number 
of each group being chosen based on how rare a haplotype one wants to be guaranteed to see. For 

30 example, if one wants to have a q% chance of not missing a haplotype that exists in the population at a 
p% frequency of occurring in the reference population, the number of individuals (n) who must be 
sampled is given by 2n=log(l-q)/log(l-p) where p and q are expressed as fractions. A preferred 
reference population allows the detection of any haplotype whose frequency is at least 10% with about 
99% certainty and comprises about 20 unrelated individuals from each of the four population groups 

35 named above. A particularly preferred reference population includes a 3-generation family 

representing one or more of the four population groups to serve as controls for checking quality of 
haplotyping procedures. 
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In a preferred embodiment, the haplotype frequency data for each ethnogeographic group is 

examined to determine whether it is consistent with Hardy- Weinberg equilibrium. Hardy- Weinberg 

* 

equilibrium (D.L. Haiti et al., Principles of Population Genomics, Sinauer Associates (Sunderland, 
MA), 3 rd Ed., 1 997) postulates that the frequency of finding the haplotype pair H , / H 2 is equal to 

5 P H ^{HJ HJ^lp^HMHJ ifi/, *H 2 vnAp H ^{H x )H 2 )=p{H x )p{HJ if H, =H 2 . 
A statistically significant difference between the observed and expected haplotype frequencies could 

• * 

be due to one or more factors including significant inbreeding in the population group, strong selective 
pressure oh the gene, sampling bias, and/or errors in the genotyping process. If large deviations from 
Hardy-Weinberg equilibrium are observed in an ethnogeographic group, the number of individuals in 

1 0 that group can be increased to see if the deviation is due to a sampling bias. If a larger sample size 
does not reduce the difference between observed and expected haplotype pair frequencies, then one 
may wish to consider haplotyping the individual using a direct haplotyping method such as, for 
example, CLASPER System™ technology (U.S. Patent No. 5,866,404), single molecule dilution, or 
allele-specific long-range PCR (Michalotos-Beloin et al., Nucleic Acids Res. 24:4841-4843, 1996). 

15 In one embodiment of this method for predicting a FY haplotype pair for an individual, the 

assigning step involves performing the following analysis. First, each of the possible haplotype pairs 
is compared to the haplotype pairs in the reference population. Generally, only one of the haplotype 
pairs in the reference population matches a possible haplotype pair and that pair is assigned to the 
individual. Occasionally, only one haplotype represented in the reference haplotype pairs is consistent 

20 with a possible haplotype pair for an individual, and in such cases the individual is assigned a 

haplotype pair containing this known haplotype and a new haplotype derived by subtracting the known 
haplotype from the possible haplotype pair. Alternatively, the haplotype pair in an individual may be 
predicted from the individual's genotype for that gene using reported methods (e.g., Clark et al. 1990 
MolBio Evol 7:1 11-22; copending PCT/US01/12831 filed April 18, 2001 ) or through a commercial 

25 haplotyping service such as offered by Genaissance Pharmaceuticals, Inc. (New Haven, CT). In rare 
cases, either no haplotypes in the reference population are consistent with the possible haplotype. pairs, 
or alternatively, multiple reference haplotype pairs are consistent with the possible haplotype pairs. In 
such cases, the individual is preferably haplotyped using a direct molecular haplotyping method such 
as, for example, CLASPER System™ technology (U.S. Patent No. 5,866,404), SMD, or allele-specific 

30 long-range PCR (Michalotos-Beloin et al., supra). 

The invention also provides a method for determining the frequency of a FY genotype, 
haplotype, or haplotype pair in a population. The method comprises, for each member of the 
population, determining the genotype or the haplotype pair for the novel FY polymorphic sites 
described herein, and calculating the frequency any particular genotype, haplotype, or haplotype pair is 

35 found in the population. The population may be e.g., a reference population, a family population, a 

same gender population, a population group, or a trait population (e.g., a group of individuals 

exhibiting a trait of interest such as a medical condition or response to a therapeutic treatment). 
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In another aspect of the invention, frequency data for FY genotypes, haplotypes, and/or 
haplotype pairs are determined in a reference population and used in a method for identifying an 
association between a trait and a FY genotype, haplotype, or haplotype pair. The trait may be any 
detectable phenotype, including but not limited to susceptibility to a disease or response to a treatment. 
In one embodiment, the method involves obtaining data on the frequency of the genotype(s), 
haplotype(s), or haplotype pair(s) of interest in a reference population as well as in a population 
exhibiting the trait. Frequency data for one or both of the reference and trait populations may be 
obtained by genotyping or haplotyping each individual in the populations using one or more of the 
methods described above. The haplotypes for the trait population may be determined directly or, 
alternatively, by a predictive genotype to haplotype approach as described above. In another 
embodiment, the frequency data for the reference and/or trait populations is obtained by accessing 
previously determined frequency data, which may be in written or electronic form. For example, the 
frequency data may be present in a database that is accessible by a computer. Once the frequency data 
is obtained, the frequencies of the genotype(s), haplotype(s), or haplotype pair(s) of interest in the 
reference and trait populations are compared. In a preferred embodiment, the frequencies of all 
genotypes, haplotypes, and/or haplotype pairs observed in the populations are compared. If a 
particular FY genotype, haplotype, or haplotype pair is more frequent in the trait population than in the 
reference population at a statistically significant amount, then the trait is predicted to be associated 
with that FY genotype, haplotype or haplotype pair. Preferably, the FY genotype, haplotype, or 
haplotype pair being compared in the trait and reference populations is selected from the full- 
genotypes and full-haplotypes shown in Tables 3 and 4, or from sub-genotypes and sub-haplotypes 
derived from these genotypes and haplotypes. Sub-genotypes useful in the invention preferably do 
not include sub-genotypes solely for any one of PS9, PS 1 4, PS 1 6 and PS 1 7 or for any combination 
thereof. 

In a preferred embodiment of the method, the trait of interest is a clinical response exhibited 
by a patient to some therapeutic treatment, for example, response to a drug targeting FY or response to 
a therapeutic treatment for a medical condition: As used herein, "medical condition" includes but is 
not limited to any condition or disease manifested as one or more physical and/or psychological 
symptoms for which treatment is desirable, and includes previously and newly identified diseases and 
other disorders. As used herein the term "clinical response" means any or all of the following: a 
quantitative measure of the response, no response, and/or adverse response (i.e., side effects). 

In order to deduce a correlation between clinical response to a treatment and a FY genotype, 
haplotype, or haplotype pair, it is necessary to obtain data on the clinical responses exhibited by a 
population of individuals who received the treatment, hereinafter the "clinical population". This 
clinical data may be obtained by analyzing the results of a clinical trial that has already been run and/or 
the clinical data may be obtained by designing and carrying out one or more new clinical trials. As 
used herein, the term "clinical trial" means any research study designed to collect clinical data on 
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responses to a particular treatment, and include s but i s not limited to phase I, phase II and phase HI 
clinical trials. Standard methods are used to define the patient population and to enroll subjects. 

It is preferred that the individuals included in the clinical population have been graded for the 
existence of the medical condition of interest. This is important in cases where the symptom(s) being 
presented by the patients can be caused by more than one underlying condition, and where treatment of 
the underlying conditions are not the same. An example of this would be where patients experience 
breathing difficulties that are due to either asthma or respiratory infections. If both sets were treated 
with an asthma medication, there would be a spurious group of apparent non-responders that did not 
actually have asthma. These people would affect the ability to detect any correlation between 
haplotype and treatment outcome. This grading of potential patients could employ a standard physical 
exam or one or more lab tests. Alternatively, grading of patients could use haplotyping for situations 
where there is a strong correlation between haplotype pair and disease susceptibility or severity. 

The therapeutic treatment of interest is administered to each individual in the trial population 
and each individual's response to the treatment is measured using one or more predetermined criteria. 
It is contemplated that in many cases, the trial population will exhibit a range of responses and that the 
investigator will choose the number of responder groups (e.g., low, medium, high) made up by the 
various responses. In addition, the FY gene for each individual in the trial population is genotyped 
and/or haplotyped, which may be done before or after administering the treatment. 

After both the clinical and polymorphism data have been obtained, correlations between 
individual response and FY genotype or haplotype content are created. Correlations may be produced 
in several ways. In one method, individuals are grouped by their FY genotype or haplotype (or 
haplotype pair) (also referred to as a polymorphism group), and then the averages and standard 
deviations of clinical responses exhibited by the members of each polymorphism group are calculated. 

These results are then analyzed to determine if any observed variation in clinical response 
between polymorphism groups is statistically significant. Statistical analysis methods which may be 
used are described in L.D. Fisher and G. vanBelle, "Biostati sties: A Methodology for the Health 
Sciences", Wiley-Interscience (New York) 1993. This analysis may also include a regression 
calculation of which polymorphic sites in the FY gene give the most significant contribution to the 
differences in phenotype. One regression model useful in the invention is described in WO 01/0121 8, 
entitled "Methods for Obtaining and Using Haplotype Data". 

A second method for finding correlations between FY haplotype content and clinical responses 
uses predictive models based on error-minimizing optimization algorithms. One of many possible 
optimization algorithms is a genetic algorithm (R. Judson, "Genetic Algorithms and Their Uses in 
Chemistry" in Reviews in Computational Chemistry, Vol. 10, pp. 1-73, K. B. Lipkowitz and D. B. 
Boyd, eds. (VCH Publishers, New York, 1997). Simulated annealing (Press et al., "Numerical Recipes 
in C: The Art of Scientific Computing", Cambridge University Press (Cambridge) 1 992, Ch. 10), 
neural networks (E. Rich and K. Knight, "Artificial Intelligence", 2 nd Edition (McGraw-Hill, New 
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York, 1991, Ch. 18), standard gradient descent methods (Press et aL, supra, Ch. 10), or other global or 
local optimization approaches (see discussion in Judson, supra) could also be used. Preferably, the 
correlation is found using a genetic algorithm approach as described in WO 01/01218. 

Correlations may also be analyzed using analysis of variation (ANOVA) techniques to 
determine how much of the variation in the clinical data is explained by different subsets of the 
polymorphic sites in the FY gene. As described in WO 01/01218, ANOVA is used to test hypotheses 

about whether a response variable is caused by or correlated with one or more traits or variables that 

* 

can be measured (Fisher and vanBelle, supra, Ch. 10). 

From the analyses described above, a mathematical model may be readily constructed by the 
skilled artisan that predicts clinical response as a function of FY genotype or haplotype content. 
Preferably, the model is validated in one or more follow-up clinical trials designed to test the model. 

The identification of an association between a clinical response and a genotype or haplotype 
(or haplotype pair) for the FY gene may be the basis for designing a diagnostic method to determine 
those individuals who will or will not respond to the treatment, or alternatively, will respond at a lower 
level and thus may require more treatment, i.e., a greater dose of a drug. The diagnostic method may 
take one of several forms: for example, a direct DNA test (i.e., genotyping or haplotyping one or more 
of the polymorphic sites in the FY gene), a serological test, or a physical exam measurement. The only 
requirement is that there be a good correlation between the diagnostic test results and the underlying 
FY genotype or haplotype that is in turn correlated with the clinical response. In a preferred 
embodiment, this diagnostic method uses the predictive haplotyping method described above. 

In another embodiment, the invention provides an isolated polynucleotide comprising a 
polymorphic variant of the FY gene or a fragment of the gene which contains at least one of the novel 
polymorphic sites described herein. The nucleotide sequence of a variant FY gene is identical to the 
reference genomic sequence for those portions of the gene examined, as described in the Examples 
below, except that it comprises a different nucleotide at one or more of the novel polymorphic sites 
PS1, PS2, PS3, PS4, PS5, PS6, PS7, PS8, PS10, PS1 1, PS12, PS13, PS15, PS18, PS19 and PS20, and 
may also comprise one or more additional polymorphisms selected from the group consisting of 
cytosine at PS9, guanine at PS 14, mymine at PS 16 and adenine at PS 17. Similarly, the nucleotide 
sequence of a variant fragment of the FY gene is identical to the corresponding portion of the reference 
sequence except for having a different nucleotide at one or more of the novel polymorphic sites 
described herein. Thus, the invention specifically does not include polynucleotides comprising a 
nucleotide sequence identical to the reference sequence of the FY gene (or other reported FY 
sequences) or to portions of the reference sequence (or other reported FY sequences), except for the 
haplotyping and genotyping oligonucleotides described above. 

4 

The location of a polymorphism in a variant FY gene or fragment is preferably identified by 
aligning its sequence against SEQ ID NO: 1 . The polymorphism is selected from the group consisting 
of thymine at PS1, adenine at PS2, guanine at PS3, thymine at PS4, adenine at PS5, cytosine at PS6, 
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thymine at PS7, cytosine at PS8, thymine at PS10, thymine at PS1 1, guanine at PS12, thymine at PS13, 
thymine at PS15, thymine at PS18, adenine at PS19 and mymine at PS20. In a preferred embodiment, 
the polymorphic variant comprises a natnraUy-occurring isogene of the FY gene which is defined by 
any one of haplotypes 1-23 shown in Table 4 below. 
5 Polymorphic variants of the invention may be prepared by isolating a clone containing the FY 

gene from a human genomic library. The clone may be sequenced to determine the identity of the 
nucleotides at the novel polymorphic sites described herein. Any particular variant or fragment 
thereof^ that is claimed herein could be prepared from this clone by performing in vitro mutagenesis 
using procedures well-known in the art. Any particular FY variant or fragment thereof may also be 

10 prepared using synthetic or semi-synthetic methods known in the art. 

FY isogenes, or fragments thereof, may be isolated using any method that allows separation of 
the two "copies" of the FY gene present in an individual, which, as readily understood by the skilled 
artisan, may be the same allele or different alleles. Separation methods include targeted in vivo cloning 
(TP/C) in yeast as described in WO 98/01 573, U.S. Patent No. 5,866,404, and U.S. Patent No. 

15 5,972,614. Another method, which is described in U.S. Patent No. 5,972,614, uses an allele specific 
oligonucleotide in combination with primer extension and exonuclease degradation to generate 
hemizygous DNA targets. Yet other methods are single molecule dilution (SMD) as described in 
Ruano et ah, Proc. Natl Acad. Sci. 87:6296-6300, 1990; and allele specific PCR (Ruano et al., 1989, 
supra; Ruano et al., 1991, supra; Michalatos-Beloin et al., supra). 

20 The invention also provides FY genome anthologies, which are collections of at least two FY 

isogenes found in a given population. The population may be any group of at least two individuals, 

■ 

including but not limited to a reference population, a population group, a family population, a clinical 
population, and a same gender population. A FY genome anthology may comprise individual FY 
isogenes stored in separate containers such as microtest tubes, separate wells of a microtitre plate and 

25 the like. Alternatively, two or more groups of the FY isogenes in the anthology may be stored in 
separate containers. Individual isogenes or groups of such isogenes in a genome anthology may be 
stored in any convenient and stable form, including but not limited to in buffered solutions, as DNA 
precipitates, freeze-dried preparations and the like. A preferred FY genome anthology of the invention 
comprises a set of isogenes defined by the haplotypes shown in Table 4 below. 

30 An isolated polynucleotide containing a polymorphic variant nucleotide sequence of the 

invention may be operably linked to one or more expression regulatory elements in a recombinant 
expression vector capable of being propagated and expressing the encoded FY protein in a prokaryotic 
or a eukaryotic host cell. Examples of expression regulatory elements which may be used include, but 
are not limited to, the lac system, operator and promoter regions of phage lambda, yeast promoters, and 

35 promoters derived from vaccinia virus, adenovirus, retroviruses, or SV40. Other regulatory elements 
include, but are not limited to, appropriate leader sequences, termination codons, polyadenylation 
signals, and other sequences required for the appropriate transcription and subsequent translation of the 
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nucleic acid sequence in a given host cell. Of course, the correct combinations of expression . 
regulatory elements will depend on the host system used. In addition, it is understood that the 
expression vector contains any additional elements necessary for its transfer to and subsequent 
replication in the host cell. Examples of such elements include, but are not limited to, origins of 
5 replication and selectable markers. Such expression vectors are commercially available or are readily 
constructed using methods known to those in the art (e.g., F. Ausubel et al., 1987, in "Current 
Protocols in Molecular Biology", John Wiley and Sons, New York, New York). Host cells which may 
be used to express the variant FY sequences of the invention include, but are not limited to, eukaryotic 
and mammalian cells, such as animal, plant, insect and yeast cells, and prokaryotic cells, such as E. 

10 coli, or algal cells as known in the art. The recombinant expression vector may be introduced into the 
host cell using any method known to those in the art including, but not limited to, microinjection, 
electroporation, particle bombardment, transduction, and transfection using DEAE-dextran, 
lipofection, or calcium phosphate (see e.g., Sambrook et al. (1989) in "Molecular Cloning, A 
Laboratory Manual", Cold Spring Harbor Press, Plainview, New York). In a preferred aspect, 

15 eukaryotic expression vectors that function in eukaryotic cells, and preferably mammalian cells, are 
used. Non-limiting examples of such vectors include vaccinia virus vectors, adenovirus vectors, 
herpes virus vectors, and baculovirus transfer vectors. Preferred eukaryotic cell lines include COS 
cells, CHO cells, HeLa cells, NIH/3T3 cells, and embryonic stem cells (Thomson, J. A. et al., 1998 
Science 282: 1 145-1 147). Particularly preferred host cells are mammalian cells. 

' 20 As will be readily recognized by the skilled artisan, expression of polymorphic variants of the 

FY gene will produce FY mRNAs varying from each other at any polymorphic site retained in the 
spliced and processed mRNA molecules. These mRNAs can be used for the preparation of a FY 
cDNA comprising a nucleotide sequence which is a polymorphic variant of the FY reference coding 
sequence shown in Figure 2. Thus, the invention also provides FY mRNAs and corresponding cDNAs 

25 which comprise a nucleotide sequence that is identical to SEQ ID NO:2 (Fig. 2) (or its corresponding 
RNA sequence) for those regions of SEQ ID NO:2 that correspond to the examined portions of the FY 
gene (as described in the Examples below), except for having one or more polymorphisms selected 
from the group consisting of thymine at a position corresponding to nucleotide 205, thymine at a 
position corresponding to nucleotide 608, adenine at a position corresponding to nucleotide 609 and 

30 thymine at a position corresponding to nucleotide 983, and may also comprise one or more additional 
polymorphisms selected from the group consisting of guanine at a position corresponding to nucleotide 
131, thymine at a position corresponding to nucleotide 27 1 and adenine at a position corresponding to 
nucleotide 304. A particularly preferred polymorphic cDNA variant comprises the coding sequence of 
a FY isogene defined by any one of haplotypes 2-4, 6, 9, 10-12, and 16-18. Fragments of these variant 

35 mRNAs and cDNAs are included in the scope of the invention, provided they contain one or more of 
the novel polymorphisms described herein. The invention specifically excludes polynucleotides 
identical to previously identified FY mRNAs or cDNAs, and previously described fragments thereof. 
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Polynucleotides comprising a variant FY RNA or DNA_ sequence may be isolated from a biological 
sample using well-known molecular biological procedures or may be chemically synthesized. 

As used herein, a polymorphic variant of a FY gene, mRNA or cDNA fragment comprises at 
least one novel polymorphism identified herein and has a length of at least 10 nucleotides and may 
5 range up to the full length of the gene. Preferably, such fragments are between 100 and 3000 

nucleotides in length, and more preferably between 200 and 2000 nucleotides in length, and most 
preferably between 500 and 1000 nucleotides in length. 

In describing the FY polymorphic sites identified herein, reference is made to the sense strand 
of the gene for convenience. However, as recognized by the skilled artisan, nucleic acid molecules 

10 containing the FY gene or cDNA may be complementary double stranded molecules and thus 
reference to a particular site on the sense strand refers as well to the corresponding site on the 
complementary antisense strand. Thus, reference may be made to the same polymorphic site on either 
strand and an oligonucleotide may be designed to hybridize specifically to either strand at a target 
region containing the polymorphic site. Thus, the invention also includes single-stranded 

15 polynucleotides which are complementary to the sense strand of the FY genomic, mRNA and cDNA 
variants described herein. 

Polynucleotides comprising a polymorphic gene variant or fragment of the invention may be 
useful for therapeutic purposes. For example, where a patient could benefit from expression, or 
increased expression, of a particular FY protein isoform, an expression vector encoding the isoform 

20. may be administered to the patient. The patient may be one who lacks the FY isogene encoding that 
isoform or may already have at least one copy of that isogene. 

In other situations, it may be desirable to decrease or block expression of a particular FY 
isogene. Expression of a FY isogene may be turned off by trarisforming a targeted organ, tissue or cell 
population with an expression vector that expresses high levels of untranslatable mRNA or antisense 

25 RNA for the isogene or fragment thereof. Alternatively, oligonucleotides directed against the 

regulatory regions (eig., promoter, introns, enhancers, 3' untranslated region) of the isogene may block 
transcription. Oligonucleotides targeting the transcription initiation site, e.g., between positions -10 
and +10 from the start site are preferred. Similarly, inhibition of transcription can be achieved using 
oligonucleotides that base-pair with region(s) of the isogene DNA to form triplex DNA (see e.g., Gee 

30 et al. in Huber, B.E. and B.I. Carr, Molecular and Immunologic Approaches, Futura Pubhshing Co., 
Mt. Kisco, N.Y., 1 994). Antisense oligonucleotides may also be designed to block translation of FY 
mRNA transcribed from a particular isogene. It is also contemplated that ribpzymes may be designed 
that can catalyze the specific cleavage of FY mRNA transcribed from a particular isogene. 

The untranslated mRNA, antisense RNA or antisense oligonucleotides may be delivered to a 

■ 

35 target cell or tissue by expression from a vector introduced into the cell or tissue in vivo or ex vivo. 
Alternatively, such molecules may be formulated as a pharmaceutical composition for administration 
to the patient. Oligoribonucleotides and/or oligodeoxynucleotides intended for use as antisense 

27 



WO 02/30950 PCT/US01/42725 
oligonucleotides may be modified to increase stability and half-life. Possible modifications include, 
but are not limited to phosphorothioate or 2' O-methyl linkages, and the inclusion of nontraditional 

« 

bases such as inosine and queosine, as well as acetyl-, methyl-, thio-, and similarly modified forms of 
adenine, cytosine, guanine, thymine, and uracil which are not as easily recognized by endogenous 
nucleases. 

The invention also provides an isolated polypeptide comprising a polymorphic variant of (a) 
the reference FY amino acid sequence shown in Figure 3 or (b) a fragment of this reference sequence. 
The location of a variant amino acid in a FY polypeptide or fragment of the invention is preferably 
identified by aligning its sequence against SEQ ID NO: 3 (Fig. 3). A FY protein variant of the 
invention comprises an amino acid sequence identical to SEQ ID NO:3 for those regions of SEQ ID 
NO:3 that are encoded by examined portions of the FY gene (as described in the Examples below), 
except for having one or more variant amino acids selected from the group consisting ofphenylalanine 
at a position corresponding to amino acid position 69, isoleucine at a position corresponding to amino 
acid position 203, isoleucine at a position corresponding to amino acid position 203 and phenylalanine 
at a position corresponding to amino acid position 328, and niay also comprise one or more additional 
variant amino acids selected from the group consisting of glycine at a position corresponding to amino 
acid position 44, cysteine at a position corresponding to amino acid position 91 and threonine at a 
position corresponding to amino acid position 102. Thus, a FY fragment of the invention, also referred 
to herein as a FY peptide variant, is any fragment of a FY protein variant that contains one or more of 
the amino acid variations described herein. The invention specifically excludes amino acid sequences 
identical to those previously identified for FY, including SEQ ID NO:3, and previously described 
fragments thereof. FY protein variants included within the invention comprise all amino acid 
sequences based on SEQ ID NO: 3 and having any combination of amino acid variations described 
herein. In preferred embodiments, a FY protein variant of the invention is encoded by an isogene 
defined by one of the observed haplotypes, 2-4, 6, 9, 10-12, and 16-18, shown in Table 4. 

A FY peptide variant of the invention is at least 6 amino acids in length and is preferably any 
number between 6 and 30 amino acids long, more preferably between 10 and 25, and most preferably 
between 15 and 20 amino acids long. Such FY peptide variants may be useful as antigens to generate 
antibodies specific for one of the above FY isoforms. In addition, the FY peptide variants may be 
useful in drug screening assays. 

A FY variant protein or peptide of the invention may be prepared by chemical synthesis or by 
expressing an appropriate variant FY genomic or cDNA sequence described above. Alternatively, the 
FY protein variant may be isolated from a biological sample of an individual having a FY isogene 
which encodes the variant protein. Where the sample contains two different FY isoforms (i.e., the 
individual has different FY isogenes), a particular FY isoform of the invention can be isolated by 
immunoaffinity chromatography using an antibody which specifically binds to that particular FY 
isoform but does not bind to the other FY isoform. 

28 



WO 02/30950 PCT/U SO 1/42725 

The expressed or isolated FY protein or peptide may be detected by methods known in the art, 

including Coomassie blue staining, silver staining, and Western blot analysis using antibodies specific 
for the isoform of the FY protein or peptide as discussed further below. FY variant proteins and 
peptides can be purified by standard protein purification procedures known in the art, including 
differential precipitation, molecular sieve chromatography, ion-exchange chromatography, isoelectric 
focusing, gel electrophoresis, affinity and immunoaffinity chromatography and the like. (Ausubel et. 
al., 1 987, In Current Protocols in Molecular Biology John Wiley and Sons, New York, New York). In 
the case of irumunoaffinity chromatography, antibodies specific for a particular polymorphic variant 
may be used. 

A polymorphic variant FY gene of the invention may also be fused in frame with a 
heterologous sequence to encode a chimeric FY protein. The non-FY portion of the chimeric protein 
may be recognized by a commercially available antibody. In addition, the chimeric protein may also 
be engineered to contain a cleavage site located between the FY and non-FY portions so that the FY 
protein may be cleaved and purified away from the non-FY portion. 

An additional embodiment of the invention relates to using a novel FY protein isoform, or a 
fragment thereof, in any of a variety of drug screening assays. Such screening assays may be 
performed to identify agents that bind specifically to all known FY protein isoforms or to only a subset 
of one or more of these isoforms. The agents may be from chemical compound libraries, peptide 
libraries and the like. The FY protein or peptide variant may be free in solution or affixed to a solid 
support. In one embodiment, high throughput screening of compounds for binding to a FY variant 
may be accomplished using the method described in PCT application WO84/03565, in which large 
numbers of test compounds are synthesized on a solid substrate, such as plastic pins or some other 
surface, contacted with the FY protein(s) of interest and then washed. Bound FY protein(s) are then 
detected using methods well-known in the art. 

In another embodiment, a novel FY protein isoform may be used in assays to measure the 
binding affinities of one or more candidate drugs targeting the FY protein. 

In yet another embodiment, when a particular FY haplotype or group of FY haplotypes 
encodes a FY protein variant with an amino acid sequence distinct from that of FY protein isoforms 
encoded by other FY haplotypes, then detection of that particular FY haplotype or group of FY 
haplotypes may be accomplished by detecting expression of the encoded FY protein variant using any 
of the methods described herein or otherwise commonly known to the skilled artisan. 

In another embodiment, the invention provides antibodies specific for and immunoreactive 
with one or more of the novel FY protein or peptide variants described herein. The antibodies may be 
either monoclonal or polyclonal in origin. The FY protein or peptide variant used to generate the . 
antibodies may be from natural or recombinant sources (in vitro or in vivo) or produced by chemical 
synthesis or semi-synthetic synthesis using synthesis techniques known in the art. If the FY protein or 
peptide variant is of insufficient size to be antigenic, it may be concatenated or conjugated, complexed, 
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or otherwise covalently linked to a carrier molecule to enhance the antigenicity of the peptide. 
Examples of carrier molecules, include, but are not limited to, albumins (e.g., human, bovine, fish, 
ovine), and keyhole limpet hemocyanin (Basic and Clinical Immunology, 1991, Eds. D.P. Stites, and 
A.I. Terr, Appleton and Lange, Norwalk Connecticut, San Mateo, California). 

In one embodiment, an antibody specifically immunoreactive with one of the novel protein or 
peptide variants described herein is administered to an individual to neutralize activity of the FY 
isoform expressed by that individual. The antibody may be formulated as a pharmaceutical 
composition which includes a pharmaceutically acceptable carrier. 

Antibodies specific for and immunoreactive with one of the novel protein isoforms described 
herein may be used to immune-precipitate the FY protein variant from solution as well as react with FY 
protein isoforms on Western or immunoblots of polyacrylamide gels on membrane supports or 
substrates. In another preferred embodiment, the antibodies will detect FY protein isoforms in paraffin 
or frozen tissue sections, or in cells which have been fixed or unfixed and prepared on slides, 

» • 

coverslips, or the like, for use in immunocytochemical, immunohistochemical; and 
immunofluorescence techniques. 

In another embodiment, an antibody specifically immunoreactive with one of the novel FY 
protein variants described herein is used in immunoassays to detect this variant in biological samples. 
In this method, an antibody of the present invention is contacted with a biological sample and the 
formation of a complex between the FY protein variant and the antibody is detected. As described, 
suitable immunoassays include radioimmunoassay, Western blot assay, immunofluorescent assay, 
enzyme linked immunoassay (ELISA), chemiluminescent assay, immunohistocheniical assay, 
immunocytochemical assay, and the like (see, e.g., Principles and Practice of Immunoassay, 1991, Eds. 
Christopher P. Price and David J. Neoman, Stockton Press, New York, New York; Current Protocols 
in Molecular Biology, 1987, Eds. Ausubel et al., John Wiley and Sons, New York, New York). 
Standard techniques known in the art for ELISA are described in Methods in Immunodiagnosis, 2nd 
Ed., Eds. Rose and Bigazzi, John Wiley and Sons, New York 1980; and Campbell et al., 1984, 
Methods in Immunology, W.A. Benjamin, Inc.). Such assays may be direct, indirect, competitive, or 
noncompetitive as described in the art (see, e.g., Principles and Practice of Immunoassay, 1991, Eds. 
Christopher P. Price and David J. Neoman, Stockton Pres, NY, NY; and Oellirich, M., 1984, J. Clin. 
Chem. Clin. Biochem., 22:895-904). Proteins may be isolated from test specimens and biological 
samples by conventional methods, as described in Current Protocols in Molecular Biology, supra. 

Exemplary antibody molecules for use in the detection and therapy methods of the present 
invention are intact immunoglobulin molecules, substantially intact immunoglobulin molecules, or 
those portions of immunoglobulin molecules that contain the antigen binding site. Polyclonal or 
monoclonal antibodies may be produced by methods conventionally known in the art (e.g., Kohler and 
Milstein, 1975, Nature, 256:495-497; Campbell Monoclonal Antibody Technology, the Production and 
Characterization of Rodent and Human Hybridomas, 1985, In: Laboratory Techniques in Biochemistry 
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and Molecular Biology, Eds^Burdon et al., Volume 13, Elsevier Science Publishers, Amsterdam). The 
antibodies or antigen binding fragments thereof may also be produced by genetic engineering. The 
technology for expression of both heavy and light chain genes in E. coli is the subject of PCT patent 
applications, publication number WO 901443, WO 901443 and WO 9014424 and in Huse et al., 1 989, 
Science, 246:1275-1281. The antibodies may also be humanized (e.g., Queen, C. et al. 1989 Proc. 
Natl. Acad. Sci.USA 86; 10029). 

* 

Effect(s) of the polymorphisms identified herein on expression of FY may be investigated by 
various means known in the art, such as by in vitro translation of mRNA transcripts of the FY gene, 
cDNA orfragment thereof, or by preparing recombinant cells and/or nonhuman recombinant 
organisms, preferably recombinant animals, containing a polymorphic variant of the FY gene. As used 
herein, "expression" includes but is not limited to one or more of the following: transcription of the 
gene into precursor mRNA; splicing and other processing of the precursor mRNA to produce mature 
mRNA; mRNA stability; translation of the mature mRNA(s) into FY protein(s) (including effects of 
polymorphisms on codon usage and tRNA availability); and glycosylation and/or other modifications 
of the translation product, if required for proper expression and function. 

To prepare a recombinant cell of the invention, the desired FY isogene, cDNA or coding 
sequence may be introduced into the cell in a vector such that the isogene, cDNA or coding sequence 
remains extrachromosomal. In such a situation, the gene will be expressed by the cell from the 
extrachromosomal location. In a preferred embodiment, the FY isogene, cDNA or coding sequence is 
introduced into a cell in such a way that it recombines with the endogenous FY gene present in the 
cell. Such recombination requires the occurrence of a double recombination event, thereby resulting in 
the desired FY gene polymorphism. Vectors for the introduction of genes both for recombination and 
for extrachromosomal maintenance are known in the art, and any suitable vector or vector construct 
may be used in the invention. Methods such as electroporation, particle bombardment, calcium 
phosphate co-precipitation and viral transduction for introducing DNA into cells are known in the art; 
therefore, the choice of method may lie with the competence and preference of the skilled practitioner. 
Examples of cells into which the FY isogene, cDNA or coding sequence may be introduced include, 
but are not limited to, continuous culture cells, such as COS, CHO, NIH/3T3, and primary or culture 
cells of the relevant tissue type, i.e., they express the FY isogene, cDNA or coding sequence. Such 
recombinant cells can be used to compare the biological activities of the different protein variants. 

Recombinant nonhuman organisms, i.e., transgenic animals, expressing a variant FY gene, 
cDNA or coding sequence are prepared using standard procedures known in the art. Preferably, a 
construct comprising the variant gene, cDNA or coding sequence is introduced into a nonhuman 
animal or an ancestor of the animal at an embryonic stage, i.e., the one-cell stage, or generally not later 
than about the eight-cell stage. Transgenic animals carrying the constructs of the invention can be 
made by several methods known to those having skill in the art. One method involves transfecting into 
the embryo a retrovirus constructed to contain one or more insulator elements, a gene or genes (or 
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cDNA or coding sequence) of interest, and other components known to those skilled in the art to 
provide a complete shuttle vector harboring the insulated gene(s) as a transgene, see e.g., U.S. Patent 
No. 5,610,053. Another method involves directly injecting a transgene into the embryo. A third 
method involves the use of embryonic stem cells. Examples of animals into which the FY isogene, 
cDNA or coding sequences may be introduced include, but are not limited to, mice, rats, other rodents, 
and nonhuman primates (see "The Introduction of Foreign Genes into Mice 11 and the cited references 
therein, In: Recombinant DNA, Eds. J.D. Watson, M. Gilman, J. Witkowski, and M. Zoller; W.H. 
Freeman and Company, New York, pages 254-272). Transgenic animals stably expressing a human 
FY isogene, cDNA or coding sequence and producing the encoded human FY protein can be used as 
biological models for studying diseases related to abnormal FY expression and/or activity, and for 
screening and assaying various candidate drugs, compounds, and treatment regimens to reduce the 
symptoms or effects of these diseases. 

An additional embodiment of the invention relates to pharmaceutical compositions for treating 
disorders affected by expression or function of a novel FY isogene described herein. The 
pharmaceutical composition may comprise any of the following active ingredients: a polynucleotide 
comprising one of these novel FY isogenes (or cDNAs or coding sequences); an antisense 
oligonucleotide directed against one of the novel FY isogenes, a polynucleotide encoding such an 
antisense oligonucleotide, or another compound which inhibits expression of a novel FY isogene 
described herein. Preferably, the composition contains the active ingredient in a therapeutically 
effective amount. By therapeutically effective amount is meant that one or more of the symptoms 
relating to disorders affected by expression or function of a novel FY isogene is reduced and/or 
eliminated. The composition also comprises a pharmaceutically acceptable carrier, examples of which 
include, but are not limited to, saline, buffered saline, dextrose, and water. Those skilled in the art may 
employ a formulation most suitable for the active ingredient, whether it is a polynucleotide, 
oligonucleotide, protein, peptide or small molecule antagonist. The pharmaceutical composition may 
be administered alone or in combination with at least one other agent, such as a stabih'zing compound. 

• ■ 

Administration of the pharmaceutical composition may be by any number of routes including, but not 
limited to oral, intravenous, intramuscular, intra-arterial, intramedullary, intrathecal, intraventricular, 
intradermal, transdermal, subcutaneous, intraperitoneal, intranasal, enteral, topical, sublingual, or 
rectal. Further details on techniques for formulation and administration may be found in the latest 
edition of Remington's Pharmaceutical Sciences (Maack Publishing Co., Easton, PA). 

For any composition, determination of the therapeutically effective dose of active ingredient 
and/or the appropriate route of administration is well within the capability of those skilled in the art. 
For example, the dose can be estimated initially either in cell culture assays or in animal models. The 
animal model may also be used to determine the appropriate concentration range and route of 
administration. Such information can then be used to determine useful doses and routes for 
administration in humans. The exact dosage will be determined by the practitioner, in light of factors 
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relating to the patient requiring treatment, including but not limited to severity of the disease state, 
general health, age, weight and gender of the patient, diet, time and frequency of administration, other 
drugs being taken by the patient, and tolerance/response to the treatment. 

Any or all analytical and mathematical operations involved in practicing the methods of the 
present invention may be implemented bj- a computer. In addition, the computer may execute a .. 
program that generates views (or screens) displayed on a display device and with which the user can 
interact to view and analyze large amounts of information relating to the FY gene and its genomic 
variation, including chromosome location, gene structure, and gene family, gene expression data, 
polymorphism data, genetic sequence data, and clinical data population data (e.g., data on 
ethnogeographic origin, clinical responses, genotypes, and haplotypes for one or more populations). 
The FY polymorphism data described herein may be stored as part of a relational database (e.g., an 
instance of an Oracle database or a set of ASCII flat files). These polymorphism data may be stored on 
the computer's hard drive or may, for example, be stored on a CD-ROM or on one or more other 
storage devices accessible by the computer. For example, the data may be stored on one or more 
. databases in communication with the computer via a network. 

Preferred embodiments of the invention are described in the following examples. Other 
embodiments within the scope of the claims herein will be apparent to one skilled in the art from 
consideration of the specification or practice of the invention as disclosed herein. It is intended that the 
specification, together with the examples, be considered exemplary only, with the scope and spirit of 
the invention being indicated by the claims which follow the examples. 

» • 

EXAMPLES 

The Examples herein are meant to exemplify the various aspects of carrying out the invention 
and are not intended to limit the scope of the invention in any way. The Examples do not include 
detailed descriptions for conventional methods employed, such as in the performance of genomic DNA 

m 

isolation, PCR and sequencing procedures. Such methods are well-known to those skilled in the art 
and are described in numerous publications, for example, Sambrook, Fritsch, and Maniatis, "Molecular 
Cloning: A Laboratory Manual", 2 nd Edition, Cold Spring Harbor Laboratory Press, USA, (1989). 

EXAMPLE 1 

This example illustrates examination of various regions of the FY gene for polymorphic sites. 

Amplification of Target Regions 

The following target regions of the FY gene were amplified using PCR primer pairs. The 
primers used for each region are represented below by providing the nucleotide positions of their initial 
and final nucleotides, which correspond to positions in SEQ ID NO:l (Figure 1). 
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PCR Primer Pairs 



Fragment 


Forward Primer 


Reverse Primer 






PCR Product 


Fragment 1 


2486 - 


2508 


complement of 


3095 - 


3074 


610 


nt 


Fragment 2 


2773 - 


2795 


complement of 


3355 - 


3333 


583 


nt 


Fragment 3 


2812 - 


2834 


complement of 


3318 - 


3295 


507 


nt 


Fragment 4 


3048 - 


3073 


complement of 


3562 - 


3541 


515 


nt 


Fragment 5 


3255 - 


3278 


complement of 


3779 - 


3757 


525 


nt 


Fragment 6 


3872 - 


3892 


complement of 


4431 - 


4409 


560 


nt 


Fragment 7 


4118- 


4140 


complement of 


4726 - 


4705 


609 


nt 


Fragment 8 


4402 - 


4423 ' 


complement of 


5039 - 


5016 


638 


nt 


Fragment 9 


4706 - 


4727 


complement of 


5302 - 


5280 


597 


nt 



These primer pairs were used in PCR reactions containing genomic DNA isolated from 
immortalized cell lines for each member of the Index Repository. The PCR reactions were carried out 
under the following conditions: 



Reaction volume — 10 pi 

10 x Advantage 2 Polymerase reaction buffer (Clontech) = 1 pi 

100 ng of human genomic DNA = 1 pi 

lOmMdNTP = 0.4 pi 

Advantage 2 Polymerase enzyme mix (Clontech) = 0.2 pi 

Forward Primer (10 pM) = 0.4 pi 

Reverse Primer (10 pM) = 0.4 pi 

Water = 6.6pl 



Amplification profile: 
97°C - 2 min. 1 cycle 

97°C-15sec. S 

70°C - 45 sec. L 10 cycles' 

72°C - 45 sec. I 



97°C- 15 sec. ^ 

64°C - 45 sec. I 35 cycles 

72°C-45sec. J 

Sequencing of PCR Products 

The PCR products were purified using a Whatman/Polyfiltronics 100 pi 384 well unifilter 
plate essentially according to the manufacturers protocol. The purified DNA was eluted in 50 pi of 
distilled water. Sequencing reactions were set up using Applied Biosystems Big Dye Terminator 
chemistry essentially according to the manufacturers protocol. The purified PCR products were 
sequenced in both directions using the primer sets represented below by the nucleotide positions of 
their initial and final nucleotides, which correspond to positions in SEQ ID NO: 1 (Figure 1). Reaction 
products were purified by isopropanol precipitation, and run on an Applied Biosystems 3700 DNA 
Analyzer. 
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Sequencing Primer Pairs 



Fragment 1 
Fragment 2 
Fragment 3 
Fragment 4 
Fragment 5 
Fragment 6 
Fragment 7 
Fragment 8 
Fragment 9 



Fragment 



2515 - 2536 
2799 - 2818 
2863 - 2881 
3071 - 3090 
3286 - 3305 
3898 - 3917 
4162 - 4181 
4435 -. 4453 
4730 - 4750 



Forward Primer 



Reverse Primer 
complement of 
complement of 
complement of 
complement of 
complement of 
complement of 
complement of 
complement of 
complement of 



3054 
3330 
3280 
3487 
3757 
4396 
4705 
4942 
5246 



- 3035 
3311 



- 3738 

- 4377 

- 4686 



- 4922 

- 5227 



3261 
3468 



5 Analysis of Sequences for Polymorphic Sites 

Sequence information for a rninimum of 80 humans was analyzed for the presence of 
polymorphisms using the Polyphred program (Nickerson et al., Nucleic Acids Res. 14:2745-275 1, 
1997). The presence of a polymorphism was confirmed on both strands. The polymorphisms and their . 
locations in the FY reference genomic sequence (SEQ ID NO: 1) are listed in Table 2 below. 
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Table 2. Polymorphic Sites Identified in the FY Gene 
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Polymorphic 


Nucleotide Reference 


Variant 


CDS Variant AA 


Site No. 


Polyld(a) 


Position 


Allele 


Allele 


Position 


Variant 


PS1 


8104322 


2690 


C 


T 






PS2 


8104330 


2864 


G 


A 






PS3 


8104332 


2882 


A 


G 






PS4 . 


8104334 


2910 


C 


T 






PS5 


8104340 


2949 


C 


A 






PS6 


8104344 


2980 


G 


C 






PS7 


8104346 


2996 


C 


T 






PS8 


8104350 


3259 


T 


C 






PS9(R) 


8104352 


3470 


T 


C 






PS10 


8104354 


3672 


C 


T 






PS11 


8104356 


3707 


C 


T 


- 




PS12 


. 8104358 


3979 


A 


G 






PS13 


8104360 


3997 


C 


T 






PS14(R) . 


8104364 


4140 


A 


G 


131 


D44G 


PS15 


8104366 


4214- 


C 


T 


205 


L69F 


PS16(R) 


8104370 


4280 


C 


T 


271 


R91C 


PS17(R) 


8104372 


4313 


G 


A 


304 


A102T 


PS18 


8104379 


4617 


C 


T 


608 


T203M 


PS19 


8104381 


4618 


G 


A 


609 


T203T 


PS20 


8104387 


4992 


C 


T 


983 


S328F 



Double PS 



T203I 



(a) Polyld is a unique identifier assigned to each PS by Genaissance Pharmaceuticals, Inc. 

(b) Double PS Variant refers to an amino acid change caused by two polymorphic sites within the 
same codon 

(R) Reported previously. 



EXAMPLE 2 

This example illustrates analysis of the FY polymorphisms identified in the Index Repository 
1 0 for human genotypes and haplotypes. 

The different genotypes containing these polymorphisms that were observed in unrelated 
members of the reference population are shown in Table 3 below, with the haplotype pair indicating 
the combination of haplotypes determined for the individual using -the haplotype derivation protocol 
described below. In Table 3, homozygous positions are indicated by one nucleotide and heterozygous 
15 positions are indicated by two nucleotides. Missing nucleotides in any given genotype in Table 3 were 
inferred based on linkage disequilibrium and/or Mendelian inheritance. 
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Table 3(Part 1). Genotypes and Haplotype Pairs Observed for FY Gene 



10 



15 



20 



25 



30 



35 



40 



Genotype | 


| Polymorphic Sites 










■ 




Number 1 HAP Pair 1 PS1 


PS2 


PS3 


PS4 


PS5 


PS6 


PS7 


PS8 


PS9 


PS10 


1 | 10 

2 1 21 


10 1 c 


G 


A 


C 


c 


G 


C 


T 


T 


T 


21 


c 


G 


A 


T 


c 


G 


T 


T 


T 


C 


3 I 


7 


7 


c 


G 


A 


C 


c 


G 


C 


T 


T 


C 


4 


14 


14 | C 


G 


A 


C 


c 


G 


C 


T 


T 


T 


5 


5 


5 1 C 


G 


A 


C 


c 


G 


C 


T 


C 


T 


6 


12 


12 


c 


G 


A 


C 


c 


G 


C 


T 


T 


T 


7 I 


I io 


16 


c 


G 


A 


C 


c 


G 

» 


C 


T 


T 


T 


8 


I 7 


1 


1 c 


G/A 


A 


C 


c 


G 


C 


T 


T 


C 


9 


10 


14 


c 


G 


A 


C 


c 


G 


C 


T 


T 


T 


10 


.21 


15 


c 


G 


A 


T/C 


c 


G 


T/C 


T 


T 


C/T 


11 


5 


8 | C 


G 


A 


C 


c 


G 


C 


T 


C/T 


T/C 


12 


5 


14 | C 


G 


A 


c 


c 


G 


C 


T 


C/T 


T 


13 


1 io 


6 i C 


G 


A 


c 


c 


G 


C 


T 


T 


T/C 


14 


18 


2 


c 


G 


A 


c 


C/A 


G 


C 


T 


T 


T 


15 


15 


19 


c 


G 


.A 


C/T 


c 


G 


C 


T 


T 


T 


16 


21 


7 ■ 


1 c 


G 


. A 


T/C 


c 


G 


T/C 


T 


T 


C 


17 


5 


21 


1 c 


G 


A 


C/T 


c 


G 


C/T 


T 


C/T 


T/C 


18 


10 


13 1 C 


G 


A 


c 


c 


G 


C 


T 


T 


T 


19 


10 


1 


c 


G/A 


A 


c 


c 


G 


C 


T 


T 


T/C 


20 


14 


1 


1 c 


G/A 


A 


c 


c 


G 


C 


T 


T 


T/C 


21 


10 


21 


c 


G 


A 


C/T 


c 


G 


C/T 


T 


T 


T/C 


22 


10 


18 


c 


G 


A 


C 


c 


G 


C 


T 


T 


T 


23 


18 


15 


c 


G 


A 


c 


c 


G 


C 


T 


T 


T 


24 


10 


7 


c 


G 


A . 


c 


c 


G 


C 


T 


T 


T/C 


25 


5 


23 


C/T 


G 


A 


c 


c 


G 


C 


T 


C 


T 


26 


10 


5 


c 


G 


A 


c 


c 


G 


C 


T 


T/C 


T 


27 


10 


12 1 C 


G 


A 


c 


c 


G 


C 


T 


T 


T 


28 




22 


c 


G 


A/G 


c 


c 


G 


C 


T 


C 


T/C 


29 


5 


7 


c 


G 


A 


c 


c 


G 


C 


T 


C/T 


T/C 


30 


7 


22 


c 


G 


A/G 


c 


c 


G 


C 


T 


T/C 


C 


31 


21 


14 I 


c 


G 


A 


T/C 


c 


G 


T/C 


T 


T 


C/T 


32 


10 


3 1 C 


G 


A 


c 


c 


G/C 


C 


T 


T 


T 


33 


10 


9 


c 


G 


A 


C - 


c 


G 


C 


T 


T 


T 


34 


10 


11 


c 


G 


A 


C 


c 


G 


C 


T 


T 


T 


35 


20 


2 | 


c 


G 


A 


T/C 


C/A 


G 


C 


T 


T 


T 


36 


10 


20 ! 


c 


G 


A 


C/T 


c 


G 


C 


T 


T 


T 


37 


20 


7 I 


c 


G 


A 


T/C 


c 


G 


C 


T 


T 


T/C 


38 | 


10 


17 | 


c 


G 


A 


c 


c 


G 


C 


T 


T 


T 


39 | 


10 


4 1 


c 


G 


A 


c 


c 


G 


C 


T/C 


T 


T 


40 | 


5 


18 | 


c . 


G 


A 


c 


c 


G 


c 


T 


C/T 


T 
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Table 3(Part 2). Genotypes and Haplotype Pairs Observed for FY Gene 



Genotype 



Polymorphic Sites 





Number 


| HAP Pair 


| PS11 


PS12 PS13 PS14 PS15 PS16 PS17 PS18 PS19 PS20 




1 


1 10 


10 


1 c 


A 


C 


G 


C 


c 


G 


C 


G 


C 


5 


2 


1 21 


21 


1 c 


A 


C 


A 


C 


c 


G 


C 


. G 


C 




3 


1 7 


7 . 


1 c 


G 


C 


A 


c 


c 


G 


C 


G 


C 




4. 


1 14 


14 


1 T 


A 


c 


A 


c 


c 


A 


C 


G 


C 




5 


| 5 


5 


1 " c 


A 


c 


A 


c 


c 


G 


C 


G 


C 




6 


1 12 


12 


1 c 


A 


c 


G 


T 


c 


G 


C 


G 


C 


10 


7 


10 


16 


1 c/r 


A 


c 


G/A 


C 


C/T 


G/A 


C 


G 


C 




8 


7 


1 


1 c 


G 


c 


A 


C 


c 


G 


C 


G 


C 




9 


10 


14 


| C/T 


A 


c 


G/A 


C 


c 


G/A 


C 


G 


C 




10 


21 


15 


C/T 


A 


c 


A 


C 


c 


G 


C 


G 


C 




u 


I 5 


8 


C 


A/G 


C/T 


A 


C 


c 


G 


C 


G 


C. 


15 


12 


I 5 


14 


| C/T 


A 


c 


A 


c 


c 


G/A 


C 


G 


C 




13 


1 io 


6 


1 c 


A 


c 


G 


c 


c 


G 


C 


G 


C 




' 14 


1 18 


2 


1 T 


A 


c 


G/A 


c 


c 


G/A 


C 


G 


C 




15 


15 


19 


| T/C 


A 


c 


A 


c 


c 


G 


C 


G 


C 




16 


21 


7 


1 c 


A/G 


c 


A 


c 


c 


G 


C 


G 


C 


20 


17 


5 


21 


1 c 


A. 


c 


A 


c 


c 


G 


C 


G 


C 




18 


10 


13 


1 c 


A 


c 


G 


C/T 


c 


G 


C 


G 


C/T 




19 


10 


1 


1 c 


A/G 


c 


G/A 


c 


c 


G 


C 


G 


C 




20 


14 


1 


| T/C 


A/G 


c 


A 


c 


c 


A/G 


C 


G 


C 




21 ' | 


10 


21 


1 . c 


-A 


c 


G/A 


c 


c 


G 


C 


G 


C 


25 


22 


10 


18 


C/T 


A 


c 


G 


c 


c 


G 


C 


G 


C 




23 


18 


15 


T 


A 


c 


G/A 


c 


c 


G 


C 


G 


C 




24 


10 


7 


1 c 


A/G 


c 


G/A 


c 


c 


G 


C 


G 


C 




25 


5 


23 


1 c 


A 


c 


A 


c 


c 


G 


C 


G 


C 




26 


10 


5 


c 


A 


c 


G/A. 


c 


c 


G 


C 


G 


C 


30 


27 ! 


10 


12 


c 


A 


c 


G 


C/T 


c 


G 


c 


G 


C 




28 | 5 


22 


c 


A 


c 


A 


c 


c 


G 


c 


G 


C 




29 


5 


7 


c 


A/G 


c 


A 


c 


c 


G 


c 


G 


C 




30 


7 


22 


c 


G/A 


c 


A 


c 


c 


G 


c 


G 


C 




31 | 


21 


14 


C/T 


A 


c 


A 


c 


c 


G/A 


c 


G 


c 


35 


32 | 


10 


3 


c 


A 


c 


G 


c 


c 


G 


c 


G 


c 




33 | 


10 


9 1 


C 


A 


c 


G 

« 


c 


c 


G 


c 


G/A 


c 




34 | 10 


11 - 


c 


A 


c 


G 


c 


c 


G 


C/T 


G 


c 




35 | 20 


2 I 


T 


A 


c 


A 


c 


c 


G/A 


c 


G 


c 




36 | 10 


20 | 


C/T 


•A 


c 


G/A 


c 


c 


G 


c 


G 


c 


40 


37 | 20 


7 


. T/C 


A/G 


c 


A 


c 


c 


G 


c 


G 


c 




38 | 


10 


17 


C/T 


A 


c 


G/A 


C/T 


c 


G 


c 


G 


c 




39 | 


10 


4 1 


C 


A 


c 


G 


c 


c 


G 


c 


G 


c 




40 | 5 


18 | 


C/T 


A 


c 


A/G 


c 


c 


G 


c 


G 


c 



45 The haplotype pairs shown in Table 3 were estimated from the unphased genotypes using a 

computer-implemented extension of Clark's algorithm (Clark, A.G. 1 990 Mol Bio Evol 7 S 111-1 22) for 
assigning haplotypes to unrelated individuals in a population sample, as described in 

■ 

PCT/US01/12831, filed April 1 8, 2001 . In this method, haplotypes are assigned directly from 

individuals who are homozygous at all sites or heterozygous at no more than one of the variable sites. 

50 This list of haplotypes is then used to deconvolute the unphased genotypes in the remaining (multiply 

heterozygous) individuals. In the present analysis, the list of haplotypes was augmented with 

haplotypes obtained from two families (one three-generation Caucasian family and one two-generation 
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African- American family). 

. By following this protocol, it was determined that the Index Repository examined herein arid, 
by extension, the general population contains the 23 human FY haplotypes shown in Table 4 below. 

An FY isogene defined by a full-haplotype shown in Table 4 below comprises the regions of 
the SEQ ID NOS indicated in Table 4, with their corresponding set of polymorphic locations and 
identities, which are also set forth in Table 4. 





Table 4 (Part 1). Haplotypes of the FY gene. 






















Regions 


PS 


PS 


Haplotype Number(d) 






. 








10 


Examined(a) 


No.(b) 


Position(c) 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 




2486-3779 


1 


2690/30 


C 


C 


C 


C 


C 


C 


C 


C 


C 


c 




2486-3779 


2 


2864/150 


A 


G 


G 


G 


G 


G 


G 


G 


G 


G 




2486-3779 


3 


2882/270 


A 


A 


A 


A 


A 


A 


A 


A 


A 


A 




2486-3779 


4 


2910/390 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


15 


2486-3779 


5 


2949/510 


C 


A 


C 


C 


C 


c 


C 


C 


C 


C 




2486-3779 


6 


2980/630 


G 


G 


c 


G 


G 


G 


G 


G 


G 


G 




2486-3779 . 


7 


2996/750 


C 


C 


c 


C 


C 


C 


C 


c. 


C 


C 




2486-3779 


8 


3259/870 


T 


T 


T 


C 


T 


T 


T 


T 


T 


T 




2486-3779 


9 


3470/990 


T 


T 


T 


T 


C 


T 


T 


T 


T 


T 


20 


2486-3779 


10 


3672/1110 


c 


T 


T 


T 


T 


C 


C 


C 


T 


T 




2486-3779 


11 


3707/1230 


c 


T 


C 


C 


C 


C 


C 


C 


C 

* 


C 




3872-5302 


12 


3979/1350 


G 


A 


A 


A ' 


A 


A 


G 


G 


A 


A 




3872-5302 


13 


3997/1470 


C 


C 


C 


C 


C 


C 


C 


T 


C 


C 




3872-5302 . 


14 


4140/1590 


A 


A 


G 


G 


A 


G 


A 


A 


G 


G 


25 


3872-5302 


15 


4214/1710 


C 


C 


C 


C 


C 


C 


C 


e 


C 


C 




3872-5302 


16 


4280/1830 


C 


c 


C 


c 


C 


C 


C 


c 


C 


C 




3872-5302 


17 


4313/1950 


G 


A 


G 


G 


G 


G 


G 


G 


G 


G 




3872-5302 


18 


4617/2070 


C 


C 


C 


C 


C 


C 


C 


c 


C 


C 




3872-5302 


19 


4618/2190 


G 


G 


G 


G 


G 


G 


G 


G 


A 


G 


30 


3872-5302 


20 


4992/2310 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 



39 



WO 02/30950 



PCT/US01/42725 



Table 4 (Part 
Regions 
Examined(a) 
2486-3779 
5 2486-3779 
2486-3779 
2486-3779 
2486-3779 
2486-3779 

10 2486-3779 
2486-3779 
2486-3779 
2486-3779 
2486-3779 

15 3872-5302 
3872-5302 
3872-5302 
3872-5302 
3872-5302 

20 3872-5302 
3872-5302 
3872-5302 
3872-5302 

25 Table 4 (Part 
Regions 
Examined(a) 
2486-3779 
2486-3779 

30 2486-3779 
• 2486-3779 
2486-3779 
2486-3779 
2486-3779 

35 2486-3779 
2486-3779 
2486-3779 
2486-3779 
3872-5302 

40 3872-5302 
3872-5302 
• 3872-5302 
3872-5302 
3872-5302 

45 3872-5302 
3872-5302 
3872-5302 



2). Haplotypes of the FY gene. 



PS 


PS 


Haplotype Number(d) 














No.fb) 


Position(c) 


11 


12 


13 


14 


15 


16 


17 


18 


19 


20 


1 


2690/30 


C 


c 


c 


c 


c 


c 


c 


c 


c 


c 


2 


2864/150 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


3 


2882/270 


A 


A 


A 


A 


A 


A 


A 


A 


A 


A 


4 


2910/390 


c 


C 


C 


c 


c 


C 


c 


c 


T 


T 

JL 


5 


2949/510 


c 


c 


c 


c 

v^ 


c 


c 


c 


c 


c 


c 


6 


2980/630 


G 


G 


G 


G 

vj 


G 


G 


G 


G 

vj» 


G 


G 


7 


2996/750 


c 


c 


c 


c 


C 


c 


c 


c 

v^ 


c 


c 

v^ 


8 


3259/870 


T 

JL 


T 

jl 


T 

jl 


T - 


T 


T 

JL 


T 

JL 


T 

JL 


T 

JL 


T 

jl. 


9 


3470/990 


T 

jl 


T 

JL 


T 


T 


T 

jl 


T 

JL 


T 

JL 


T 

JL 


T 


T 

JL 


10 


3672/1110 


T 


T 


T 

JL 


T 


T 


T 

JL 


T 

JL 


T 

JL 


T 

JL 


T 

JL 


11 


3707/1230 


c 


c 


c 


T 

jl 


T 


T 

JL 


T 


T - 

JL 


c 


T 

Jl 


12 


3979/1350 


A 


A 


A 


A 


A 


A 

JL JL 


A 

X JL 


A 

J> JL 


A 

j* 


A 


13 


3997/1470 


C 


C 


C 


C 


c 


c 


c 


c 


C 


c 


14 


4140/1590 


G 


G 


G 


A 


A 


A 


A 


G 


A 


A 


15 


4214/1710 


C 


T 


T 


c 


c 


C 


T 


c 


c 


c 


16 


4280/1830 


C 


C 


C 


c 


c 


T 


c 


C 


c 


c 


17 


4313/1950 


G 


G 


G 


A 


G 


A 

J> JL 


G 


G 


G 


G 


18 


4617/2070 


T 


C 


C 


C 


c 


c 


c 


c 


c 


c 


19 


4618/2190 


G 


G 


G 


G 


G 

VJ 


G 

VJ 


G 

VJ 


G 


G 

VJ 


G 

VJ 


20 


4992/2310 


C 


C 


T 


C 


c 


c 


c 








i. Haplotypes of the FY gene. 


















■ 


PS 


PS 


Haplotype Number(d) 














No.(b) 


Position(c) 


21 


22 


23 
















1 


2690/30 


C 


C 


T 
















2 


2864/150 


G 


G 


G • 
















3 


2882/270 


A 


G 


A 
















4 


2910/390 


T 


C 


C 
















5 


2949/510 


C 


C 


C 








* 








6 


2980/630 


G 


G 


G 
















7 


2996/750 


T 


C 


C 




- 












8 


3259/870 


T 


T 


T 
















9 


3470/990 


T 


C 


C 
















10 


3672/1110 


c 


C 


T 
















11 


3707/1230 


c 


C 


C 
















12 


3979/1350 


A 


A 


A 
















13 


3997/1470 


C 


C 


C 
















14 


4140/1590 


A 


A 


A 
















15 


4214/1710 


C 


C 


C 
















16 


4280/1830 


C 


C 


C 
















17 


4313/1950 


G 


G 


G 
















18 


4617/2070 


C 


C 


C 
















19 


4618/2190 


G 


G 


G 
















20 


4992/2310 


C 


C 


C 

















50 



55 



(a) Region examined represents the nucleotide positions defining the start and stop positions 
within SEQ ID NO: 1 of the regions sequenced; 

(b) PS = polymorphic site; 

(c) Position of PS within the indicated SEQ ID NO, with the Imposition number referring to 
SEQ ID NO: 1 and the 2 nd position number referring to SEQ ID NO:84, a modified version of 
SEQ ID NO: l that comprises the context sequence of each polymorphic site, PS1-PS20, to 
facilitate electronic searching of the haplotypes; 
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(d) Alleles for FY haplotypes are presented 5 ' to 3 ' in each column. 

SEQ ED NO : 1 refers to Figure 1 , with the two alternative allelic variants of each polymorphic 
site indicated by the appropriate nucleotide symbol. SEQ ID NO: 84 is a modified version of SEQ ID 
5 NO:l that shows the context sequence of each of PS1-PS20 in a uniform format to facilitate electronic 
searching of the FY haplotypes. For each polymorphic site, SEQ ID NO: 84 contains a block of 60 
bases of the nucleotide sequence encompassing the centrally-located polymorphic site at the 30 th 
position, followed by 60 bases of unspecified sequence to represent that each polymorphic site is 

* ■ 

separated by genomic sequence whose composition is defined elsewhere herein: 
10 Table 5 below shows the percent of chromosomes characterized by a given FY haplotype for 

all unrelated individuals in the Index Repository for which haplotype data was obtained. The percent 
of these unrelated individuals who have a given FY haplotype pair is shown in Table 6. In Tables 5 
and 6, the "Total" column shows this frequency data for all of these unrelated individuals, while the 
other columns show the frequency data for these unrelated individuals categorized according to their 
15 self-identified ethnogeographic origin. Abbreviations used in Tables 5 and 6 are AF = African 
Descent, AS = Asian, CA = Caucasian, HL = Hispanic-Latino, and AM = Native American. 



20 



25 



30 



35 



40 



Table 5 Frequency of Observed FY Haplotypes In Unrelated Individuals 



HAP No. 


HAP ID 


Total 


CA 


AF 


AS 


HL 


AM 


1 


8107557 


1.83 


0.0 


0.0 


0.0 


8.33 


0.0 


2 


8107559 


1.22 


4.76 


0.0 


0.0 


0.0 


0.0 


3 


8107568 


0.61 


0.0 


0.0 


2.5 


0.0 


0.0 


4 


8107564 


0.61 


0.0 


0.0 


0.0 * 


2.78 


0.0 


5 


8107548 


15.85 


0.0 


62.5 


0.0 


2.78 


0.0 


6 


8107565 


0.61 


0.0 


0.0 


2.5 


0.0 


0.0 


7 


8107551 


6.1 


9.52 


5.0 


0.0 


5.56 


33.33 


8 


8107563 


0.61 


0.0 


2.5 


0.0 


0.0 


0.0 


9 


8107566 


0.61 


0.0 


0.0 


2.5 . 


0.0 


0.0 


10 


8107547 


36.59 


42.86 


2.5 


70.0 


30.56 


33.33 


11 


8107567 


0.61 


0.0 


0.0 


2.5 


0.0 


0.0 


12 - 


8107555 


3.05 


0.0 


0.0 


12.5 


0.0 


0.0 


13 


8107561 


0.61 


0.0 


0.0 


2.5 


0.0 


0.0 


14 


8107552 


4.88 


2.38 


2.5 


0.0 


16.67 


0.0 


15 


8107556 


1.83 


4.76 


0.0 


0.0 


2.78 


0.0 


16 


8107562 


-0.61 


0.0 


0.0 


0.0 


2.78 


0.0 


17 


8107569 


0.61 


0.0 


0.0 


2.5 


0.0 


0.0 


18 


8107553 


3.66 


7.14 


5.0 


0.0 


2.78 


0.0 


19 


8107560 


* 0.61 


2.38 


0.0 


0.0 


0.0 


0.0 


20 


8107549 


7,32 


16.67 


0.0 


2.5 


5.56 


33.33 


21 


8107550 


6.71 


9.52 


0.0 


0.0 


19.44 


0.0 


22 


8107554 


3.05 


0.0 


12.5 


0.0 


0.0 


0.0 


23 


8107558 


1.83 


0.0 


7.5 


0.0 


0.0 


0.0 
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Table 6. Frequency of Observed FY Haplotype Pairs In Unrelated Individuals 





HAP1 


HAP2 


Total 


CA 


AF 


AS 


HL 


AM 




10 


10 


17.07 


19.05 


0.0 


45.0 


5.56 


0.0 


5 


21 


21 


1.22 


0.0 


0.0 


0.0 


5.56 


0.0 




7 


7 


1.22 


0.0 


0.0 


0.0 


0.0 


33.33 




14 


14 


2.44 


0.0 


0.0 


0.0 


11.11 


0.0 




5 


5 


7.32 


0.0 


30.0 


.0.0 


0.0 


0.0 




12 


12 


1.22 


0.0 


0.0. 


5.0 


0.0 


0.0 


10 


10 


16 


1.22 


0.0 


0.0 


0.0 


5.56 


0.0 




7 


1 


1.22 


0.0 


0.0 


0.0 


5.56- 


0.0 




10 


14 


1.22 


0.0 


0.0 


0.0 


5.56 


0.0 




21 


15 


1.22 


6.0 


0.0 


0.0 


5.56 


0.0 




5 


8 


1.22 


0.0 


5.0 


0.0 


0.0 


0.0 


15 


5 


14 


1.22 


0.0 


5.0 


0.0 


0.0 


0.0 




10 


6 


1.22 


0.0 


0.0 


5.0 


0.0 


0.0 




18 


2 


1.22 


4.76 


0.0 


0.0 


0.0 


0.0 




15 


19 


1.22 


4.76 


0.0 


0.0 


0.0 


0.0 




21 


7 


2.44 


4.76 


0.0 


0.0 


5.56 


0.0 


20 


5 


21 


1.22 


0.0 


0.0 


0.0 


5.56 


0.0 




10 


13 


1.22 


0.0 


0.0 


. 5.0 


0.0 


0.0 




10 


1 


1.22 


0.0 


0.0 


0.0 


5.56 


0.0 




14 


1 


1.22 


0.0 


0.0 


0.0 


5.56 


0.0 




10 


21 


4.88 


9.52 


0.0 


0.0 


11.11 


0.0 


25 


10 


18 


2.44 


4.76 


0.0 


0.0 


5.56 


0.0 




18 


15 


1.22 


4.76 


0.0 


0.0 


0.0 


0.0 




10 


7 


2.44 


9.52 


0.0 


0.0 


0.0 


0.0 




5 


23 


3.66 


0.0 


15.0 


0.0 


0.0 


0.0 




10 


5 


. 1.22 


6.0 


5.0 


0.0 


0.0 


0.0 


30 


10 


12 


3.66 


0.0 


0.0 


15.0 


0.0 


0.0 




5 


22 


4.88 


0.0 


20.0 


0.0 


0.0 


0.0 




5 


7 


l;22 


0.0 


5.0 


0.0 


0.0 


0.0 




7 


22 


1.22 


0.0 


5.0 


0.0 


0.0 


0.0 




21 


14 


1.22 


4.76 


0.0 


0.0 


0.0 


0.0 


35 


10 


3 


1.22 


0.0 


0.0 


5.0 


0.0 


0.0 




• 10 


9 


1.22 


0.0 


0.0 


5.0 


0.0 


0.0 




10 


11 


1.22 


0.0 


0.0 


5.0 


0.0 


0.0 




20 


2 


1.22 


4.76 


0.0 


0.0 


0.0 


0.0 




10 


20 


12.2 


23.81 


0.0 


5.0 


11.11 


66.67 


40 


20 


7 


1.22 


4.76 


0.0 


0.0 


0.0 


0.0 




10 


17 


1.22 


0.0 


0.0 


5.0 


0.0 


0.0 




10 


4 


1.22 


0.0 


0.0 


0.0 


5.56 


0.0 




5 


18 


2.44 


.0.0 


10.0 


0.0 


0.0 


0.0 



45 The size and composition of the Index Repository were chosen to represent the genetic 

diversity across and within four major population groups comprising the general United States 
population. For example, as described in Table 1 above, this repository contains approximately equal 
sample sizes of African-descent, Asian- American, European- American, and Hispanic-Latino 
population groups. Almost all individuals representing each group had all four grandparents with the 

50 same ethnogeographic background. The number of unrelated individuals in the Index Repository 

provides a sample size that is sufficient to detect SNPs and haplotypes that occur in the general 

population with high statistical certainty. For instance, a haplotype that occurs with a frequency of 5% 
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in the general^ population has ^probability higher than 99.9% of being observed in a sample of 80 
individuals from the general population. Similarly, a haplotype that occurs with a frequency of 10% in 
a specific population group has a 99% probability of being observed in a sample of 20 individuals from 
that population group. In addition, the size and composition of the Index Repository means that the 
relative frequencies determined therein for the haplotypes and haplotype pairs of the FY gene are 
likely to be similar to the relative frequencies of these FY haplotypes and haplotype pairs id the 
general U.S. population and in the four population groups represented in the Index Repository. The 
genetic diversity observed for the three Native Americans is presented because it is of scientific 
interest, but due to the small sample size it lacks statistical significance. 

In view of the above, it will be seen that the several advantages of the invention are achieved 
and other advantageous results attained. 

As various changes could be made in the above methods and compositions without departing 
from the scope of the invention, it is intended that all matter contained in the above description and 
shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense. 

All references cited in this specification, including patents and patent applications, are hereby 
incorporated in their entirety by reference. The discussion of references herein is intended merely to 
summarize the assertions made by their authors and no admission is made that any reference 
constitutes prior art. Applicants reserve the right to challenge the accuracy and pertinency of the cited 
references. 
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What is Claimed is: 

1 . A method for haplotyping the Duffy blood group (FY) gene of an individual, which comprises 
determining which of the FY haplotypes shown in the table immediately below defines one 
copy of the individual's FY gene, wherein the determining step comprises identifying the 
5 phased sequence of nucleotides present at each of PS 1-PS20 on at least one copy of the 

• * 

individuaPs.FY gene, and wherein each of the FY haplotypes comprises a sequence of 
polymorphisms whose positions and identities are set forth in the table immediately below: 



PS PS Haplotype Number(c) (Part 1) 



10 


No.fa) 


PositionfM 


1 


2 


3 


4 


5 


6 


7 

* 


8 


9 


10 




1 
i 


2690 


c 


c 


c 


c 


c 




C 


c 


c 


c 




2 


2864 


A 


G 


G 


G 


G 


G 


G 


G 


G 


G 




3 


2882 


A 


A 


A 


A 


A 


A 


A 


A 


A 


A 




4 


2910 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


15 


5 


2949 


C 


A 


C 


C 


C 


C 


C 


c 


C 


C 




6 


2980 


G 


G 


C 


G 


G 


G 


G 


G 


G 


G 




7 


2996 


C 


C 


c 


C 


C 


C 


C 


C 


C 


C 




8 


3259 


T 


T 


T 


C 


T 


T 


T 


T 


T 


T 




9 


3470 


T 


T 


T 


T 


C 


T 


T 


T 


T 


T 


20 


10 


3672 


C 


T 


T 


T 


T 


C 


C 


C 


T 


T 




11 


3707 


C 


T 


C 


C 


C 


C 


C 


C 


C 


C 




12. 


3979 


G 


A 


A 


A 


A 


A 


G 


G 


A 


A 




13 


3997 


C 


C 


C 


c 


C 


C 


C 


T 


C 


C 




14 


4140 


A 


A 


G 


G 


A 


G 


A 


A 


G 


G 


25 


15 


4214 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 




16 


4280 


C 


c 


C 


■ 

C 


C 


C 


c 


C 


C 


C 




17 


4313 


G 


A 


G 


G 


G 


G 


G 


G 


G 


G 




18 


4617 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 




19 


4618 


G 


G 


G 


G 


G 


G 


G 


G 


A 


G 


30 


20 


4992 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 
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PS 


PS 


Haplotype Number(c) (Part 2) 














Mo.(a) 


Position(b) 


11 


12 


13 


14 


15 


16 


17 


18 


19 


20 


• 


1 


2690 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 




2 


2864 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


5 


3 


2882 


A 


A 


A 


A 


A 


A 


A 


A 


A 


A 




4 


2910 


C 


C 


C 


C 


C 


C 


C 


C 


T 


T 




5 


2949 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 




6 


2980 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 




7 


2996 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


10 


8 


3259 


T 


T 


T 


T 


T 


T 


T 


T 


T 


T 




9 


3470 


T 


T 


T 


T 


T 


T 


T 


T 


T 


T 




10 


3672 


T 


T 


T 


T 


T 


T 


T 


T 


T 


T 




11 


3707 


e 


C 


C 


T 


T 


T 


T 


T 


C 


T 




12 


3979 


A 


A 


A 


A 


A 


A 


A 


A 


A 


A 


15 


13 


3997 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 




14 


4140 


G 


G 


G 


A 


A 


A 


A 


G 


A 


A 




15 


4214 


C 


T 


T 


C 


C 


C 


T 


C 


C 


C 




16 


4280 


C 


C 


C 


C 


C 


T 


C 


c 


C 


C 




17 


4313 


G 


G 


G 


A 


G 


A 


G 


G 


G 


G 


20 


18 


4617 


T 


C 


C 


C 


C 


C 


C 


C 


C 


C 




19 


4618 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 




20 


4992 

- 


C 


C 


T 


C 


C 


C 


C 


C 

• 


C 


C 




PS 


PS 


Haplotype 


Number(c) (Part 3) 






• • 






25 


No.(a) 


Position(b) 


21 


22 


23 


















1 


2690 


C 


C 


T 








• 










2 


2864 


G 


G 


G 


















3 


2882 


A 


G 


A 


















4 


2910 


T 


C 


C 
















30 


5 


2949 


C 


C 


C 


















6 


2980 


G 


G 


G 


















7 


2996 


T 


C 


c . 


















* 8 


3259 


T 


T 


T 


















9 


3470 


T 


C 


C 












- 




35 


10 


3672 


C 


C 


T 










- 








11 


3707 


e 


c 


C 


















12 


3979 


A 


A 


. A 


















13 


3997 


C 


c 


C 


















14 


4140 


A 


A 


A 
















40 


15 


4214 


C 


C 


C 


















16 


4280 


C 


C 


C 


















17 


4313 


G 


G 


G 


















18 


4617 


C 


C 


C 


















19 


4618 


G 


G 


G 
















45 


20 


4992 


C 


C 


C 

















(a) PS = polymorphic site; 

(b) Position of PS within SEQ ID NO: 1 ; 

(c) Alleles for haplotypes are presented 5' to 3' in each column. 
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2. A method for haplotyping the Duffy blood group (FY) gene of an individual, which comprises 

determining which of the FY haplotype pairs shown in the table immediately below defines 

both copies of the individual's FY gene, wherein the detemining step comprises identifying 
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the phased sequence of nucleotides present at each of PS1-PS20 on both copies of the 
individual's FY gene, and wherein each of the FY haplotype pairs consists of first and second 
haplotypes which comprise first and second sequences of polymorphisms whose positions and 
identities are set forth in the table immediately below: 



PS 


PS 


Haplotype Pair(c) (Part 1) 










No.(a) 


Position(b) 


10/10 


21/21 


7/7 


14/14 


5/5 


12/12 


10/16 


7/1 


1 


2690 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


2 


2864 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/A 


3 


2882 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


4 


2910 


C/C 


T/T 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


5 


2949 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


6 


2980 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


7 


2996 


C/C 


T/T 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


8 


3259 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


9 


3470 


T/T 


T/T 


T/T 


T/T 


C/C 


T/T 


T/T 


T/T 


10 


3672 


T/T 


. C/C 


C/C 


T/T 


T/T 


T/T 


T/T 


C/C 


11 


3707 


C/C 


C/C 


C/C 


T/T 


C/C 


C/C 


C/T 


C/C 


12 


3979 . 


A/A 


A/A 


G/G 


A/A 


A/A 


A/A 


A/A 


G/G 


13 


3997 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


14 


4140 


G/G 


A/A 


A/A 


A/A 


A/A 


G/G 


G/A 


A/A 


15 


4214 


C/C 


C/C 


C/C 


C/C 


C/C 


T/T 


C/C 


C/C 


16 


4280 . 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/T 


C/C 


17 


4313 


G/G 


G/G 


G/G 


A/A 


G/G 


G/G 


G/A 


G/G 


18 


4617 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


19 


4618 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


20 


4992 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 
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PS 


PS 


Haplotype 


Pair(c) (Part 2) 












No.(a) 


Position(b) 


10/14 


21/15 


5/8 


5/14 


10/6 


18/2 


15/19 


21/7 




1 ' 


2690 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




2 


2864 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


5 


3 


2882 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 




4 


2910 


C/C 


T/C 


C/C 


C/C 


C/C 


C/C 


C/T 


T/C 




5 


2949 


C/C 


C/C 


C/C 


C/C 


C/C 


C/A 


C/C 


C/C 




6 


2980 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 




7 


2996 


C/C 


T/C 


C/C 


C/C 


C/C. 


C/C 


C/C 


T/C 


10 


8 


3259 


T7T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 




9 


3470 


T/T 


T/T 


C/T 


C/T 


T/T 


T/T 


T/T 


T/T 




10 


• 3672 


T/T 


C/T 


T/C 


T/T 


T/C 


T/T 


T/T 


C/C 




11 


3707 


C/T 


C/T 


C/C 


C/T 


C/C 


T/T 


T/C 


C/C 




12 


3979 . 


A/A 


A/A 


A/G 


A/A 


A/A 


A/A 


A/A 


A/G 


15 


13 


3997 


C/C 


C/C 


C/T 


C/C 


C/C 


C/C 


C/C 


•C/C 




14 


4140 


G/A 


A/A 


A/A 


A/A 


G/G 


G/A 


A/A 


A/A 




15 


4214 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




16 


4280 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




17 


4313 


G/A 


G/G 


G/G 


G/A 


G/G 


G/A 


G/G 


G/G 


20 


18 


4617 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




19 


4618 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 




20 


4992 


C/C 

* 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




PS 


PS 


Haplotype Pair(c) (Part 3) 






• ■ 




25 


No.(a) 


Position(b) 


5/21 


10/13 


10/1 


14/1 


10/21 


10/18 


18/15 


10/7 




1 


2690 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




2 


2864 


G/G 


G/G 


G/A 


G/A 1 


G/G 


G/G 


G/G 


G/G 

V.JP / 




3 


2882 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 




4 


2910 


C/T 


C/C 


C/C 


C/C 


C/T 


C/C 


C/C 


C/C 


30 


5 


2949 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




6 


2980 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 




7 


2996 


C/T 


C/C 


C/C 


C/C 


C/T 


C/C 


C/C 


C/C 




8 


3259 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 




9 


3470 


C/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


35 


10 


3672 


T/C 


T/T 


T/C 


T/C 


T/C 


T/T 


T/T 


T/C 




11 


3707 


C/C 


C/C 


C/C 


T/C 


C/C 


C/T 


T/T 


C/C 




12 


3979 


A/A 


A/A - 


A/G 


A/G 


A/A 


A/A 


A/A 


A/G 




13 


3997 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




14 


4140 


.A/A 


G/G 


G/A 


A/A 


G/A 


G/G 


G/A 


G/A 


40 


15 


4214 


C/C 


C/T 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




16 


4280 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




17 


4313 


G/G 


G/G 


G/G 


A/G 


G/G 


G/G 


G/G 


G/G 




18 


4617 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




19 


4618 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


45 


20 


4992 


C/C 


C/T 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 
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PS 


PS 


Haplotype Pair(c) (Part 4) 












No.(a) 


Position(b) 


5/23 


10/5 


10/12 


5/22 


5/7 


7/22 


21/14 


10/3 




1 


2690 


C/T 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




2 


2864 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


5 


3 " 


2882 


A/A 


A/A 


A/A 


A/G 


A/A 


A/G 


A/A 


A/A 




4 


2910 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


T/C 


C/C 




5 


2949 


c/c 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




6. 


2980 


GIG 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


• G/C 




7 


2996 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


T/C 


C/C 


10 . 


8 


3259 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 




9 


3470 


C/C 


T/C 


T/T 


C/C 


C/T 


T/C 


T/T 


T/T 




10 


3672 


T/T 


T/T 


T/T 


T/C 


T/C 


C/C 


C/T 


T/T 




11 


3707 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/T 


C/C 




12 


3979 


A/A 


A/A 


A/A 


A/A 


A/G 


G/A 


A/A 


A/A 


15 


13 


3997 


C/C 


C/C 


C/C 


C/C' 


C/C 


C/C 


C/C 


C/C 




14 


4140 


A/A 
A/A 


vjr/A 


KJIKJ 


A / A 

A/A 


A/A 


A/A 


A/A 


G/G 






4214 


C/C 


C/C 


C/T 


C/C 




t/l 


C/C 


C/C 




16 


4280 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




17 


4313 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/A 


G/G 


20 


18 


4617 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




19 


4618 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 




20 


4992 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




PS 


PS 


Haplotype Pair(c) (Part 5) 


• 








25 


No.(a) 


Position(b) 


10/9 


10/11 


20/2 


10/20 


20/7 


10/17 


10/4 


5/18 




1 


2690 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




2 


2864 


G/G. 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 




3 


2882 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 




4 


2910 


C/C 


C/C 


T/C 


C/T 


T/C 


C/C 


C/C 


C/C 


30 


5 


2949 


C/C 


C/C 


CIA 


C/C 


C/C 


C/C 


C/C 


C/C 




6 


2980 


G/G 


G/G 


GIG 


G/G 


G/G 


G/G 


G/G 


G/G 


• 


7 


2996 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




8 


3259 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/C 


T/T 




9 


3470 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


C/T 


35 


10 


3672 


T/T 


T/T 


T/T 


T/T 


' T/C 


T/T 


T/T 


T/T 




11 


3707 


C/C 


C/C 


T/T 


C/T 


T/C 


C/T 


C/C 


C/T 




12 


3979 


A/A 


A/A 


A/A 


A/A 


A/G 


A/A 


A/A 


A/A 


■ 


13 


3997 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




14 


. 4140 


G/G 


G/G 


A/A 


G/A 


A/A 


G/A 


G/G 


A/G 


40 


15 


4214 


C/C 


C/C 


C/C 


C/C 


C/C 


C/T 


C/C 


C/C 




16 


4280 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




17 


4313 


G/G 


G/G 


G/A 


G/G 


G/G 


G/G 


G/G 


G/G 




18 


4617 


C/C 


C/T 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




19 


4618 


G/A 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


45 


20 


4992 


C/C 


C/C 


C/C . 


C/C 


C/C 


C/C 


C/C 


C/C 



(a) PS = polymorphic site; 

(b) Position of PS in SEQ ID NO : 1 ; 

(c) Haplotype pairs are represented as 1 st haplotype/2 nd haplotype; with alleles of each 
50 haplotype shown 5 ' to 3 ' as 1 st polymorphism/2 polymorphism in each column. 

3. A method for genotyping the Duffy blood group (FY) gene of an individual, comprising 
dete rminin g for the two copies of the FY gene present in the individual the identity of the 
nucleotide pair at one or more polymorphic sites (PS) selected from the group consisting of 
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PS1, PS2, PS3, PS4, PS5, PS6, PS7, PS8, PS10, PS1 1, PS12, PS13, PS15, PS18, PS19 and 
PS20, wherein the one or more polymorphic sites (PS) have the position and alternative alleles 
shown in SEQ ID NO: 1 . 

4. The method of claim 3, wherein the determining step comprises: 

(a) isolating from the individual a nucleic acid mixture comprising both copies of the FY 

» 

gene, or a fragment thereof, that are present in the individual; 

(b) amplifying from the nucleic acid mixture a target region containing one of the selected 
polymorphic sites; 

(c) hybridizing a primer extension oligonucleotide to one allele of the amplified target region, 
wherein the oligonucleotide is designed for genotyping the selected polymorphic site in 
the target region; 

(d) performing a nucleic acid template-dependent, primer extension reaction on the 
hybridized oligonucleotide in the presence of at least one terminator of the reaction, 
wherein the terminator is complementary to one of the alternative nucleotides present at 
the selected polymorphic site; and 

(e) detecting the presence and identity of the terminator in the extended oligonucleotide. 

5. The method of claim 3, which comprises determining for the two copies of the FY gene present 
in the individual the identity of the nucleotide pair at each of PS 1 -PS20. 

6. A method for haplotyping the Duffy blood group (FY) gene of an individual which comprises 
determining, for one copy of the FY gene present in the individual, the identity of the nucleotide 
at two or more polymorphic sites (PS) selected from the group consisting of PS1, PS2, PS3, 
PS4, PS5, PS6, PS7, PS8, PS 10, PS 1 1, PS 12, PS 13, PS 15, PS 18, PS 19 and PS20, wherein the 
selected PS have the position and alternative alleles shown in SEQ ID NO: 1. 

7. The method of claim 6, further comprising determining the identity of the nucleotide at one or 

* 

more polymorphic sites selected from the group consisting of PS9, PS14, PS16 and PS17, 
wherein the one or more polymorphic sites (PS) have the position and alternative alleles shown 
in SEQ IDNO:l. 

8. The method of claim 6, wherein the deternrining step comprises: 

(a) isolating from the individual a nucleic acid sample containing only one of the two copies 
of the FY gene, or a fragment thereof, that is present in the individual; 

(b) amplifying from the nucleic acid sample a target region containing one of the selected 
. polymorphic sites; 

(c) hybridizing a primer extension oligonucleotide to one allele of the amplified target region, 
wherein the oligonucleotide is designed for haplotyping the selected polymorphic site in 
the target region; 

(d) performing a nucleic acid template-dependent, primer extension reaction on the hybridized 
oligonucleotide in the presence of at least one terminator of the reaction, wherein the 

49 
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terminator is complementary to one of the alternative nucleotides present at the selected 
polymorphic site; and 

(e) detecting the presence and identity of the terminator in the extended oligonucleotide. 
9. A method for predicting a haplotype pair for the Duffy blood group (FY) gene of an individual 
comprising: 

(a) identifying a FY. genotype for the individual, wherein the genotype comprises the 
nucleotide pair at two or more polymorphic sites (PS) selected from the group consisting 

5 . of PS1, PS2; PS3, PS4, PS5, PS6, PS7, PS8, PS10, PS1 1, PS12, PS13, PS15, PS18, PS19 

4 

and PS20, wherein the selected PS have the position and alternative alleles shown in SEQ 
IDNOrl; 

(b) comparing the genotype. to the haplotype pair data set forth in the table immediately 
below; and 

10 (c) determining which haplotype pair is consistent with the genotype of the individual and 

with the haplotype pair data 



15 



20 



25 



30 



PS 


PS 


Haplotype Pair(c) (Part 1) 










No.(a) 


Position(b) 


10/10 


21/21 


111 


14/14 


5/5 


12/12 


10/16 


7/1 


1 


2690 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


2 


2864 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/A 


3 


2882 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


4 


2910 


C/C 


T/T 


C/C 


C/C . 


C/C 


C/C 


C/C 


C/C 


5 


2949 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


6 


2980 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


7 


2996 


C/C 


T/T 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


8 


3259 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


9 


3470 


T/T 


T/T 


T/T 


T/T 


C/C 


T/T 


T/T 


T/T 


10 


3672 


T/T 


C/C 


C/C 


T/T 


T/T 


T/T 


T/T 


C/C 


11 


3707 


C/C 


C/C 


C/C 


T/T 


C/C 


C/C 


C/T 


C/C 


12 


3979 


A/A 


A/A 


G/G 


A/A 


A/ A 


A/A 


A/A 


G/G 


13 


3997 


C/C . 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


14 


4140 


G/G 


A/A 


A/A 


A/A 


A/A 


G/G 


G/A 


A/A 


15 


4214 


C/C 


C/C 


C/C 


C/C 


C/C 


T/T 


C/C 


C/C 


16 


4280 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/T 


C/C 


17 


4313 


G/G 


G/G 


G/G 


A/A 


G/G 


G/G 


G/A 


G/G 


18 


4617 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


19 


4618 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


20 


4992 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 
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50 
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PS PS Haplotype Pair(c) (Part 2) 





No.(a) 


Position(b) 


10/14 


21/15 


5/8 


5/14 


10/6 


18/2 


15/19 


21/7 




1 


2690 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


40 


2 


2864 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


* 


3 


2882 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 




4 - 


2910 


C/C 


T/C 


C/C 


C/C 


C/C 


C/C 


C/T 


T/C 




5. 


2949 


C/C 


C/C 


C/C 


C/C 


C/C 


C/A 


C/C 


C/C 




6 


2980 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


45 


7 


2996 


C/C 


T/C 


C/C 


C/C 


C/C 


C/C 


C/C 


T/C 




8 


3259 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 




9 


3470 


T/T 


T/T 


C/T 


C/T 


T/T 


T/T 


T/T 


T/T 




10 


3672 


T/T 


C/T 


T/C 


T/T 


T/C 


T/T 


T/T 


C/C 




11 


3707 


C/T 


C/T 


C/C 


C/T 


C/C 


T/T 


T/C 


C/C 


50 


12 


3979 


A/A 


. A/A 


A/G 


A/A 


A/A 


A/A 


A/A 


A/G 




13 


3997 


C/C 


C/C 


C/T 


C/C 


C/C 


C/C 


C/C 


C/C 




14 


4140 


G/A 


A/A 


A/A 


A/A 


G/G 


y t I M 

G/A 


A/A 


A/A 




15 


4214 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




16 


4280 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


55 


17 


4313 


G/A 


G/G 


G/G 


G/A 


G/G 


G/A 


G/G 


G/G 




18 


. 4617 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




19 


4618 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 




20 


. 4992 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 

■ 


C/C 


60 


PS 


PS 


Haplotype Pair(c) (Part 3) 






- ■ 






No fa} 


Positionfb) 


5/21 


10/13 


10/1 


14/1 


10/21 


10/18 


18/15 


10/7 




1 


2690 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C • 


C/C 




2 


2864 


G/G 


G/G 


G/A 


G/A 


G/G 


G/G 


G/G 


G/G 




3 


2882 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


65 


4 


2910 


C/T 


C/C 


C/C 


C/C 


C/T 


C/C 


C/C 


C/C 




5 


2949 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




6 


2980 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 




7 


2996 


C/T 


C/C 


C/C 


C/C 


C/T 


C/C 


C/C 


C/C 




8 


3259 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


70 


9 


3470 


C/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 




10 


3672 


T/C 


T/T 


T/C 


T/C 


T/C 


T/T 


T/T 


T/C 




11 


3707 


C/C 


C/C 


C/C 


T/C 


C/C . 


C/T 


T/T 


C/C 




12 


3979 


A/A 


A/A 


A/G 


A/G 


A/A 


A/A 


A/A 


A/G 


• 


13 


3997 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


75 


14 


4140 


A/A 


G/G 


G/A 


A/A 


G/A 


G/G 


G/A 


G/A 




15 


4214 


C/C 


C/T 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




16 


4280 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




17 


4313 


G/G 


G/G 


G/G 


A/G 


G/G 


G/G 


G/G 


G/G 




18 


4617 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


80 


19 


4618 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 




20 


4992 


C/C 


C/T 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 
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x o 


PS 


Haplotype Pair(c) (Part 4) . 










QD 




Po siti on fl"A 

X LfiJlllwlJl U 1 


5/23 


10/5 


10/12 


5/22 


5/7 


7/22 


21/14 


10/3 




1 
1 


2690 


C/T 


C/C 


C/C 

x»yf x»^ 


C/C 

X*XI x^ 


C/C 

w/ X—' 


C/C 


C/C . 


C/C 




z 


2864 


G/G 


G/G 


G/G 


G/G 

X_J / \J 


G/G 

VJ/ VJ 


G/G 

X—»/ x_i 


G/G 


G/G 






2882 


A/A 


A/A 


A/A 

-I JV» -X Xv 


A/G 

X JW X-.J 


A/A 

X Jk/ X A. 


A/G 


A/A 


A/A 




4 
t 


2910 


C/C 

V^/ v^ 


C/C 

V_W V> 


C/C 


C/C 

w/ V^ 


C/C 

VW v> 


C/C 


T/C 


C/C 


QO 




2949 


C/C 

VW V^/ 


C/C 

VW V-f 


C/C 


C/C 


C/C 

\— // v*» 


C/C 


C/C 


C/C 

v^ 




o 


2980 


G/G 

vj / vj 


G/G 


G/G 

VJ/ VJ 


G/G 

VJf VJ 


G/G 

VJf VJ 


G/G 


G/G 


G/C 




7 


2996 


C/C 


C/C 

VW V— ' 


C/C 

VW v*» 


C/C 

vw v> 


C/C 

vw v^ 


C/C 

VW V-' 


T/C 

X / v-» 


C/C 

X*B *§ w 




o 
o 


3259 


T/T 


T/T 


T/T 

X / X 


T/T 


T/T 

X/ X 


T/T 

X / X 


T/r 

A / X 


T/T 




0 


3470 


C/C 


T/C 

X / v*» 


T/T 

X/ X 


C/C 

vw v.* 


C/T 

VW X 


T/C 

X / v^ 


T/T 


T/T 




1 n 




T/T 


T/T 

X/ X 


T/T 

X/ X 


T/C 


T/C 

X / v> 


C/C 

vw v^ 


CAT 


T/T 

jL f JL 




i i 


3707 


C/C 

VW w 


C/C 

VW V*/ 


C/C 

Vw V-/ 


C/C 

VW V • 


C/C 


C/C 

V^/ V_/ 


C/T 

VW X 


C/C 

VW V— ✓ 




1 0 
1Z 


3070 


A/A 


A/A 


A/A 


A/A 

-fY/ -£X 


A/G 

iV VJ 


G/A 

VJ/ ii. 


A/A 


A/A 

A U X Jk 


• 


1 1 
l J 


3007 
Dyy i 


C/C 


C/C 


C/C 


C/C 


C/C 

vw v^ 


C/C 

V»/ Vv 


C/C 

VW V-' 


C/C 

Vp*/ v^ 




14 


4140 


A/A 


G/A 


G/G 


A/A 


A/A 


A/A 


A/A 


G/G 


100 


15 


4214 


C/C 


C/C 


C/T 


C/C 


C/C 


C/C 


C/C 


C/C 




ID 


4280 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




1 / 


4313 


G/G 


G/G 


G/G 


G/G 


G/G 

VJ/ VJ 


G/G 

VJ/ VJ 


G/A 

VJ/ X X 


G/G 




1 £ 


4617 


C/C 


C/C 


C/C 


C/C 


C/C 

V_// V-* 


C/C 

V>»/ V^ 


C/C 

V- '/ V- » 


C/C 


• 


1 O 


4f>1 R 


G/G 


G/G 


G/G 


G/G 


O/G 

VJ/ VJ 


G/G 

VJ/ VJ 


G/G 

VJ/ VJ 


G/G 

VJ/ VJ 


i r\c 

105 


zU 


40Q7 


C/C 


C/C 


C/C 


C/C 


*' C/C 


C/C 

Vw V-' 


C/C 

V^/ V-» 


C/C 

VW v^ 




riS 


x o 


Haplotype Pair(c) (Part 5) 












j\lo.(a} 


lr OSlllOn v O ^ 


10/9 


10/11 


20/2 


10/20 


on/7 


10/17 

,X V// 1 / 


10/4 

X V// *T 


5/18 

■J/ X 0 




i 
l 


96on 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 
v^/ v^ 


C/C 

v>/ v^ 


C/C 

VW V-/ 


110 




Oft A/l 


G/G 


G/G 


G/G 


G/G 


n/n 

vJ/ Vj 


VJ/ VJ 


fr/G 
VJ/ VJ 


G/G 

VJ/ VJ 




3 


7889 
ZooZ 


A/A 


A/A 


A/A 


A/A 


A /A 


A/A 

^V/ AV 


A/A 


A/A 




4 


901 n 


C/C 


C/C 


T/C 


C/T 


T/C 

1/ v>* 


C/C 


C/C 

V^/ \_x 


C/C 

v_// Vv 




■ c 


0040 


C/C 


C/C 


C/A 


C/C 


C/C 


C/C 

Vv/ 


C/C 

Vw V-' 


C/C 

W V-/ 




O 


9QRO 


G/G 


G/G 


G/G 


G/G 


VJ/ Vj 


G/G 

VJ/ VJ 


G/G 

VJ/ VJ 


G/G 

VJ/ VJ 


11C 

1 15 


/ 


9QQ6 


C/C 


C/C 


C/C 


C/C 


C/C 

Ksf Vv 


C/C 

VW V-/ 


C/C 

V/-/ V-» 


C/C 

v_^/ v> 




Q 

o 


37^0 

jJLDy 


T/T 


T/T 


T/T 


T/T 


T/T 

X/ X 


T/T 

• X / X 


T/C 

X/ V-/ 


T/T 

X / X 




y 


347H 

JT 1 /U 


T/T 


T/T 


T/T 


T/T 


T/T 

X/ X 


T/T 

X / X 


T/T 

X/ X 


C/T 

VW . X 






DO /Z 


T/T 


T/T 


T/T 


T/T 


T/C 

X/ Vv 


T/T 

X / X 


T/T 

X/ X 


T/T 

X / X 




1 1 


D IK) 1 


C/C 


C/C 


T/T 


C/T 


T/C 

X / v^ 


C/T 


C/C 

VW v^ 


C/T 

VW X 


lzU 


iz 


3070 

J? /y 


A/A 


A/A 


A/A 


A/A 


A/G 


A /A 


A/A 


A/A 




xJ 


30Q7 
Dyy 1 


C/C 


C/C 


C/C 


C/C 


C/C 

V_// V-* 


C/C 

VW Vy» 


C/C 


C/C 




l /I 
14 




G/G 


G/G 


A/A 


G/A 


A /A 


G/A 

VJ / 


G/G 

VJ/ VJ 


A/G 

XX/ VJ 




15 


4214 


C/C 


C/C 


C/C 


C/C 


C/C 


C/T 


C/C 


C/C 




16 


4280 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


-C/C 


C/C 


125 


17 


4313 


G/G 


G/G 


G/A 


G/G 


G/G 


G/G 


G/G 


G/G 




18 


4617 


C/C 


C/T 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




19 


4618 


G/A 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 




20 


4992 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 



130 (a) PS = polytiiorphic site; 

(b) Position of PS in SEQ ID NO : 1 ; 

(c) Haplotype pairs are represented as 1 st haplotype/2 nd haplotype; with alleles of each 
haplotype shown 5' to 3' as 1 st polymorphism/2 polymorphism in each column. 

10. The method of claim 9, wherein the identified genotype of the individual comprises the 

nucleotide pair at each of PS1-PS20, which have the position and alternative alleles shown in 



SEQ ID NO: 1 
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11. A method for identifying an association between a trait and at least one haplotype or haplotype 
pair of the Duffy blood group (FY) gene which comprises comparing the frequency of the 
haplotype or haplotype pair in a population exhibiting the trait with the frequency of the 
haplotype or haplotype pair in a reference population, wherein the haplotype is selected from 
haplotypes 1-23 shown in the table presented immediately below, wherein each of the 
haplotypes comprises a sequence of polymorphisms whose positions and identities are set forth 
in the table immediately below: 



PS PS Haplotype Number(c) (Part 1) 



No.(a) Position(b) 


1 , 


2 


3 


4 


5 


6 


7 


8 


9 


10 


1 2690 


C 


C 


C 


G 


C 


C 


C 


C 


C 


c 


2 2864 


A 


G 


G 


G 


G 


G 


G 


G 


G 


G 


3 2882 


A 


A 


A 


A 


A 


A 


A 


A 


A 


A 


4 2910 


C 


C 


C 


C 


C 


C 


C 


C 


C . 


C 


5 2949 


C 


A 


C 


C 


C 


C 


C 


C 


c 


C 


6 2980 


G 


G 


c 


G 


G 


G 


G 


G 


G 


G 


7 2996 


C 


. C 


c 


C 


C 


C 


C 


C 


C 


C 


8 3259 


T 


T 


T 


C 


T 


T 


T 


T 


.t 


T 


9 3470 


T 


T 


T 


T 


C 


T 


T 


T 


T 


T 


10 3672 


C 


T 


T 


T 


T 


C 


C 


C 


T 


T 


11 3707 


C 


T 


C 


C 


C 


C 


C 


C 


C 


C 


12 3979 


G 


A 


A 


A 


A 


A 


G 


G 


A 


A 


13 3997 


C 


C 


C 


C 


C 


C 


C 


T 


C 


C 


14 4140 


A 


A 


G 


G 


A 


G 


A 


A 


G 


G 


15 4214 . 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


16 4280 


C 


C 


C 


C 


c 


C 


C 


C 


C 


C 


17 4313 


G 


A 


G 


G 


G 


G 


G 


G 


G 


G 


18 4617 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


19 4618 


G 


G 


G 


G 


G 


G 


G 


G 


A 


G 


20 4992 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 
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c 
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G 


G 
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A 
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A 
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A 
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C 
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G 


G 


G 

VJ 


G 


G 

VJ 


G 

VJ. 


G 
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7 
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c 


c 


c 


C 


c 

V-^ 


c 


c 


c 


c 
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c 


4S 


Q 
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T 


T 


T 

X 


T 

X 


T 

X 


T 

A 


T 

X 


T 


T 

A 


T 
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Q 
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T 
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T 
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T 

X 


T 

X 


T 
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A 


T 


T 
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T 

X 


T 


T 


T 

X 


T 

X 


T 

X 


T 

X 


T 
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T 

X 
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x 


• 
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v^ 


V^ 


c 

v> 


T 
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T 
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T 

X 


T 

x 


T 

X 
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Vw 


p 


c 
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G 


G 


G 


A 


A 


A 


A 


G 


A 
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C 


T 


T 


C 


C 


C 


T 


C 


C 
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16 
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C 


C 


C 


c 


C 


T 


c 


c 


c 
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17 
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G 
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G 


A 


G 


A 
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G 

vj 
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VJ 


G 

VJ 
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VJ 
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C 
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(a) PS = polymorphic site; 

(b) Position of PS within SEQ ED NO: 1; 

(c) Alleles for haplotypes are presented 5 ' to 3 ' in each column; 

85 

and wherein the haplotype pair is selected from the haplotype pairs shown in the table . 

imniediately below, wherein each of the FY haplotype pairs consists of first and second 

haplotypes which comprise first and second sequences of polymorphisms whose positions in 

SEQ H) NO: 1 and identities are set forth in the table immediately below: 
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G/G 


G/G 


G/G 


i on 




2882 


A/A 


A/A 

X JL/ X 


■A/A 


A/A 


A/A 

X. J&# Ji JL 


A/A 


A/A 


A/A 






2910 


C/C 

V_W v-' 


c/c 

vw V/ 


T/C 

JL / V— ^ 


C/T 


T/C 


C/C 

V-// 


C/C 


c/c • 






2949 


c/c 


c/c 

VW 


C/A 

VW ^- A. 


C/C 

v^ 


C/C 


c/c 


C/C 


C/C 




o 


2980 


G/G 


G/G 


G/G 


G/G 


G/G 

\Jr VJ 


G/G 

V*l VJ 


G/G 


G/G 




n 
I 


2996 


c/c 


c/c 

VW Vv 


C/C 

Va>/ V^ 


C/C 


C/C 


C/C 

v^ 


C/C 


C/C 


1 


o 
o 




T/T 


T/T 

X / X 


T/T 


T/T 


T/T 


T/T 

A / Jk- 


T/C 

JL # 


T/T 




Q 


3470 


T/T 

X / X 


T/T 

X / As 


T/T 

X / X 


T/T 

X / X 


T/T 

X / X 


T/T 


T/T 

JL- # JL 


C/T 

V^/ JL 




1U 


3677 


T/T 

X/ X 


T/T 

X/ 1 


T/T 

X/ X 


T/T 

X / X 


T/C 


T/T 

X f X 


T/T 

JL. / JL 


T/T 




1 1 


3707 


C/C 


C/C 


T/T 

X / X 


C/T 

\ — '/ X 


T/C 

X / 


C/T 

Wf X 


C/C 

. V-// v»' 


C/T 

VJ/ JL 




1 0 


3Q79 


A/A 


A/A 


A/A 


A/A 


A/G 


A/A 


A/A 

JX JW A Jl 


A/A 

Jl Jm A Jb 


2UU 




3QQ7 




C/C 


C/C 


C/C 


C/C 


C/C 

VW v>» 


C/C 


C/C 

N — // VJ 




1 A 


41 40 


G/G 


G/G 


A/A 


G/A 


A/A 


G/A 

VJ/ J- JW 


G/G 

VJ/ VJ 


A/G 

X JLf ^Jj> 




15 


4214 


C/C 


c/c 


C/C 


C/C 


C/C 


C/T 


c/c 


C/C 




16 


4280 


C/C 


c/c 


c/c 


C/C 


c/c 


C/C 


c/c 


C/C 




17 


4313 


G/G 


G/G 


G/A 


G/G 


G/G 


G/G 


G/G 


G/G 


205 


18 


4617 


C/C 


C/T 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




19 


4618 


G/A 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 




20 


4992 


C/C 


C/C 


C/C 


C/C 


C/C 


•■C/C 


C/C 


C/C 



• (a) PS = polymorphic site; 
210 (b) Position of PS in SEQ ID NO: 1; 

(c) Haplotype pairs are represented as 1 st haplotype/i™ 1 haplotype; with alleles of each 
haplotype shown 5 ' to 3 ' as 1 st polymorphism/2 nd polymorphism in each column; 

wherein a higher frequency of the haplotype or haplotype pair in the trait population than in the 
215 reference population indicates the trait is associated with the haplotype or haplotype pair. 

12. The method of claim 1 1 , wherein the trait is a clinical response to a drug targeting FY or to a 
drug for treating a condition or disease associated with FY activity. 

13. An isolated oligonucleotide designed for detecting a polymorphism in the Duffy blood group 
(FY) gene at a polymorphic site (PS) selected from the group consisting of PS1, PS2, PS3, PS4, 
PS5 3 PS6, PS7, PS8, PS 10, PS 11, PS12, PS13, PS15, PS18, PS19 and PS20, wherein the 
selected PS have the position and alternative alleles shown in SEQ ID NO: 1 . . 

14. The isolated oligonucleotide of claim 13, which is an allele-specific oligonucleotide that 
specifically hybridizes to an allele of the FY gene at a region containing the polymorphic site. 

15. The allele-specific oligonucleotide of claim 14, which comprises a nucleotide sequence selected 
from the group consisting of SEQ ID NOS:4-19, the complements of SEQ ID NOS:4-19, and 
SEQ1DNOS:20-51. 

16. The isolated oligonucleotide of claim 13, which is a primer-extension oligonucleotide. 

17. The primer-extension oligonucleotide of claim 16,which comprises a nucleotide sequence 
selected from the group consisting of SEQ ID NOS:52-83. 

18. A kit for haplotyping or genotyping the Duffy blood group (FY) gene of an individual, which 
comprises a set of oligonucleotides designed to haplotype or genotype each of polymorphic sites 
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(PS) PS1, PS2, PS3, PS4, PS5, PS6, PS7, PS8, PS10, PS1 1, PS12, PS13, PS15, PS18, PS19 and 
PS20, wherein the selected PS have the position and alternative alleles shown in SEQ ID NO:l. 

19. The kit of claim 1 8, which further comprises oligonucleotides designed to genotype or haplotype 
each of PS9, PS14, PS16 and PS17, wherein the selected PS have the position and alternative 
alleles shown in SEQ ID NO: 1 . 

20. An isolated polynucleotide comprising a nucleotide sequence selected from the group consisting 

* . 

of: • 
(a) a first nucleotide sequence which comprises a Duffy blood group (FY) isogene, wherein 
the FY isogene is selected from the group consisting of isogenes 1-23 shown in the table 
immediately below and wherein each of the isogenes comprises the regions of SEQ ID 
NO:l shown in the table immediately below and wherein each of the isogenes 1-23 is 
further defined by the corresponding sequence of polymorphisms whose positions and 
identities are set forth in the table immediately below; and 



Region PS PS Isogene Number(d) (Part 1) 



Examined(a) No.(b) 


Position(c) 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


2486-3779 


1 


2690 


C 


C 


C 


C 


C 


C 


C 


C 


C 


c 


2486-3779 


' 2 


2864 


A 


G^ 


G 


G 


G 


G 


G 


G 


G 


G 


2486-3779 


3 


2882 


A 


A 


A 


A 


A 


A 


A 


A 


A 


A 


2486-3779 


4 


2910 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


2486-3779 


5 


2949 


C 


A 


C 


C 


C 


C 


C 


C 


C 


C 


2486-3779 


6 


2980 


G 


G 


C 


G 


G 


G 


G 


G 


G 


G 


2486-3779 


7 


2996 


C 


C 


c 


C 


c • 


C 


C 


C 


C 


C 


2486-3779 


8 


3259 


T 


T 


T 


C 


T 


T 


T 


T 


T 


T 


2486-3779 


9 


3470 


T 


T 


T 


T 


C 


T 


T 


T 


T 


T. 


2486-3779 


10 


3672 


C 


T 


T 


T 


T 


C 


C 


C 


T 


T 


2486-3779 


11 


3707 


C 


T 


C 


C 


C 


C 


C 


C 


C 


c 


3872-5302 


12 


3979 


G 


A 


A 


A 


A 


A 


G 


G 


A 


A 


3872-5302 


13 


3997 


C 


C 


C 


C 


C 


C 


C 


T 


C 


C 


3872-5302 


14 


4140 


A 


A 


G 


G 


A 


G 


A 


A 


G 


G 


3872-5302 


15 


4214 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


3872-5302 


16 


4280 


C 


C 


C 


C 


C 


c 


C 


C 


C 


C 


3872-5302 


17 


4313 


G 


A 


G 


G 


G 


G 


G 


G 


G 

• 


G 


3872-5302 


18 


4617 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


3872-5302 


19 


4618 


G 


G 


G 


G 


G 


G 


G 


G 


A 


G 


3872-5302 * 


20 


4992 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 
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Region 


PS 


PS 


Isogene Number(d) (Part 2) 












Examined(a) No.(b) 


Position(c) 


11 


12 


13 


14 


15 


16 


17 


18 


19 


20 


2486-3779 


1 


2690 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


2486-3779 


2 


2864 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


2486-3779 


3 


2882 


A 


A 


A 


A 


A 


A 


A 


A 


A 


A 


2486-3779 


4 


2910 


C 


C 


C 


C 


C 


C 


C 


C 


T 


T 


2486-3779 


5 


2949 


C 


C 


* 

c 


c 


C 


C 


C 


C 


C 


C 


2486-3779 


6 


2980 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


2486-3779 


7 


2996 . 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


2486-3779 


8 


3259 


T 


T 


T 


T 


T 


T 


T 


T 


T 


T 


2486-3779 


9 


3470 


T 


T 


T 


T 


T 


T 


T 


T 


T 


T 


2486-3779 


10 


3672 


T 


T 


T 


T 


T 


T 


T 


T 


T 


T 


2486-3779 


11 


3707 


C 


C 


C 


T 


T 


T 


T 


T 


C 


T 


3872-5302' 


12 


3979 


A 


A 


A 


A 


A 


A 


A 


A 


A 


A 


3872-5302 


13 


3997 


C 


C 


C 


C 


C 


G 


C 


C 


C 


C 


3872-5302 


14 


4140 


G 


G 


G 


A 


A 


A 


A 


G 


A 


A 


3872-5302 


15 


A r\ 1 A 

4214 


C 


T 


T 


C 


C 


c 


T 


C 


c 


c 


3872-5302 


16 


4280 


C 


C 


C 


e 


C 


T 


C 


c 


c 


c 


3872-5302 


17 


. 4313. 


G 


G 


G 


A 


G 


A 


G 


G 


G 


G 


3872-5302 


18 


4617 


T 


C 


C 


C 


C 


C 


C 


C 


C 


C 


3872-5302 


19 


4618 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


3872-5302 


20 


4992 

■ 


C 


C 


T 


C 


C 


C 

■ 


C 


C 


C 


C 


Region 


PS 


PS 


Isogene Number(d) (Part 3) 


• - 










Examined(a) 


No.(b) 


Position(c) 


21 


22 


23 
















2486-3779 


1 


2690 


C 


C 


T 
















2486-3779 


2 


2864 


G 


G 


G 
















2486-3779 


3 


2882 




G 


A 












• 




2486-3779 


4 


2910 


T 


C 


C 
















2486-3779 


5 


2949 


C 


C 


C 
















2486-3779 


6 


2980 


G 


G 


G 
















2486-3779 


7 


2996 


T 


C 


C 










• 






2486-3779 


8 


3259 


T 


T 


T 










• • 






2486-3779 


9 


3470 


T 


C 


C 












• 


- 


2486-3779 


10 


3672 


C 


C 


T 






- 






N 




2486-3779 


11 


■3707 


C 


C 


C 
















3872-5302 


12 


3979 


A 


A 


A 
















3872-5302 


13 


3997 


C 


C 


C 
















3872-5302 


14 


4140 


A 


A 


A 
















3872-5302 


15 


4214 


C 


C 


C 
















3872-5302 


16 


4280 


C 


C 


C 
















3872-5302 


17 


4313 


G 


G 


G 
















3872-5302 


18 


4617 


C 


C 


C 
















3872-5302 


19 


4618 


G 


G 


G 
















3872-5302 


20 


4992 


C 


C 


C 

















(a) Region examined represents the nucleotide positions defining the start and stop positions 
within the 1 st SEQ ID NO of the sequenced region. 

(b) PS = polymorphic site; 

(c) Position of PS in SEQ ID NO : 1 ; 

(d) Alleles for isogenes are presented 5 ' to 3 ' in each column; 



(b) a second nucleotide sequence which is complementary to the first nucleotide sequence. 
21 . The isolated polynucleotide of claim 20, which is a DNA molecule and comprises both the first 
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and second nucleotide sequences and further comprises expression regulatory elements operably 
linked to the first nucleotide sequence. 

22. A recombinant nonhuman organism transformed or transfected with the isolated polynucleotide 
of claim 21, wherein the organism expresses a FY protein that is encoded by the first nucleotide 
sequence. 

23. The recombinant nonhuman organism of claim 22, which, is a transgenic animal. 

24. An isolated fragment of a Duffy blood group (FY) isogene, wherein the fragment comprises at 
least 10 nucleotides in one of the regions of SEQ ID NO: 1 shown in the table immediately 
below and wherein the fragment comprises one or more polymorphisms selected from the group 
consisting of thymine at PS1, adenine at PS2, guanine at PS3, thymine at PS4, adenine at PS5, 
cytosine at PS6, thymine at PS7, cytosine at PS8, thymine at PS10, mymine at PS11, guanine at 
PS 12, thymine at PS 13, thymine at PS 15, thymine at PS 18, adenine at PS 19 and thymine at 
PS20, wherein the selected polymorphism has the position set forth in the table immediately 
below: 



10 


Region 


PS 


PS 


Isogene Number(d) (Part 1) 














Examined(a) No.(b) 


Position(c) 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 




2486-3779 


1 


2690 


C 


C 


C 


C 


C 


C 


C 


C 


C 


c 




2486-3779 


2 


2864 


A 


G 


G 


G 


G 


G 


G 


G 


G 


G 




2486-3779 


3 


2882 


A 


A 


A 


A 


A 


A 


A 


A 


A 


' A 


15 


2486-3779 


4 


2910 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 




2486-3779 


5 


2949 


C 


A 


e 


C 


C 


C 


C 


C 


C 


C 




2486-3779 


6 


2980 


G 


G 


c 


G 


G 


G 


G 


G 


G 


G 




2486-3779 


7 


2996 


C 


C 


c 


C 


C 


C 


C 


C 


C 


C 




2486-3779 


8 


3259 


T 


T 


T 


C 


T 


T 


T 


T 


T 


T 


20 


2486-3779 


9 


3470 


T 


T 


T 


T 


C 


T 


T 


T 


T 


T 




2486-3779 


10 


3672 


c- 


T 


T 


T 


T 


C 


C 


C 


T 


T 




2486-3779 


11 


3707 


C 


T 


C 


C 


C 


C 


C 


C 


C 


C 




3872-5302 


12 


3979 


G 


A 


A 


A 


A 


A 


G 


G 


A 


A 




3872-5302 


13 


3997 


C 


C 


c 


C 


C 


C 


C 


T 


C 


C 


25 


3872-5302 


14 


4140 


A 


A 


G 


G 


A 


G 


A 


A 


G 


G 




3872-5302 


15 


, 4214 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 




3872-5302 


16 


4280 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 




3872-5302 


17 


4313 


G 


A 


G 


G 


G 


G 


G 


G 


G 


G 




3872-5302 


18 


4617 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


30 


3872-5302 


19 


4618 


G 


G 


G 


G 


G 


G 


G 


G 


A 


G 




3872-5302 


20 


4992 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 
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Region 


PS 


PS 


Isogene Numbered) (Part 2) 












35 


Examined(a) 


No.(b) 


Position(c) 


11 


12 


13 


14 


15 


16 


17 


18 


19 


20 




2486-3779 


1 


2690 


c 


C 


c 


c 


C 


C 


C 


C 


C 


C 




2486-3779 


2 


2864 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 




2486-3779 


• 3 


2882 


A 


A 


A 


A 


A 


A 


A 


A 


A 


A 




2486-3779 


4 

* 


2910 


c 


c 


c 


c 


C 


C 


C 


C 


T 


T 


40 


2486-3779 


5 


2949 


c 

^*m*r 


c 


c 


c 


c 


C 


C 


C 


C 


C 




2486-3779 


6 


2980 


G 

^m^ 


G 

^mW 


G 


G 


G 


G 


G 


G 


G 


G 




2486-3779 


7 . 


2996 


c 


c 

^mS 


c 


c 


c 


C 


G 


C 


C 


C 




2486-3779 


8 


3259 


T 


T 


T 


T 


T 


T 


* T 


T 


T 


T 




2486-3779 


9 


3470 


T 


T 


T 


T 


T 


T 


T 


T 


T 


T 


45 


2486-3779 


10 


3672 


T 


T 


T 


T 


T 


T 


T 


T 


T 


T 




2486-3779 


11 


3707 


c 


c 


c 


T 


T 


T 


T 


T 


C 


T 




3872-5302 


12 


3979 


A 


A 

■IX. 


A 


A 


A 


A 


A 


A 


A 


A 




3872-5302 


13 


3997 


C 


C 


c 


c 


c 


C 


C 


C 


C 


C 




3872-5302 


14 


4140 


G 


G 


G 


A 


A 


A 


A 


G 


A 


A 


50 


3872-5302 


15 


4214 


C 


T 


T 


C 


C 


c 


T 


C 


C 


c 




3872-5302 


16 


4280 


C 


C 


C 


c 


c 


T 


C 


c 


c 


c 




3872-5302 


17 


4313 


G 


G 


G 


A 


G 


A 


G 


G 


G 


G 




3872-5302 

mmJ \mf 1 mmm» mmS mm* V 0mW 


18 


4617 


T 


C 


C 


C 


C 


C 


C 


C 


C 


C 




3872-5302 

m^J \J f m> mr mmr m*** \* 


19 


4618 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


55 

*j *mf 


3872-5302 


20 


4992 


C 


C 


T 


C 


C 


C 


C 


C 


C 


C 




Refrion 


PS 


PS 


Isogene Numbered) (Part 3) 














Kx arriinedf a^) 


No.(b) 


Positionfc} 


21 


22 


23 


















2486-3779 


1 


2690 


C 


C 


T 












• 




60 


2486-3779 


2 


2864 


G 


G 


G 


















2486-3779 


3 


2882 


A 


G 


A 


















2486-3779 


4 


2910 


T 


C 


C 


















2486-3779 

4mmt 1 V V M f mS 


5 


2949 


C 


C 


C 


















2486-3779 


6 


2980 


G 


G 


G 








- 








65 


2486-3779 


7 


2996 


T 


C 


C 










• 

• * 








2486-3779 


8 


3259 


T 


T 


T 


















2486-3779 


9 


3470 


T 


C 


C 








- 




■ 








10 


3672 


C 


C 


T 






- 












mmmt % \J V mm* # J • 


11 


3707 


C 


C 


C 
















70 


3872-5302 


12 


3979 


A 


A 


A 


















3872-5302 


13 


3997 


C 


C 


C 


















3872-5302 

VJ I A—> mmf m^/ \J J+mT 


14 


4140 


A 


A 


A 


















3872-5302 


15 


4214 


C 


C 


C 


















3872-5302 


16 


4280 


C 


c 


c 
















75 


3872-5302 


17 


4313 


G 


G 


G 


















3872-5302 


. 18 


4617 


C 


C 


C 


















3872-5302 


19 


4618 


G 


G 


G 


















3872-5302 


. 20 


4992 


C 


C 


C 

















80 (a) Region examined represents the nucleotide positions defining the start and stop positions 

within SEQ ID NO: 1 of the regions sequenced; 

(b) PS = polymorphic site; 

(c) Position of PS within SEQ ID NO: 1; 

(d) Alleles for FY isogenes are presented 5 ' to 3 ' in each column. 

85 

25. An isolated polynucleotide comprising a coding sequence for a FY isogene, wherein the coding 
sequence comprises the regions of SEQ ID NO:2, except at each of the polymorphic sites which 
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_have the positions in SEQ I D N O:2 and polymorphisms set forth in the table immediately below: 



PS PS 


Isogene Coding Sequence Number(c) (Part 1) 






No.(a) Position(b) 


2c 


3c 


4c 


6c 


9c 


10c 


11c 


12c 


13c 


14c 


14 131 


A 


G 


G 


G 


G 


G 


G 


G 


G 


A 


15 205 


C 


C 


C 


C 


C 


C 


C 


T 


T 


C 


16 271 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


17 304 


A 


G 


G 


G 


G 


G 


G 


G 


G 


A 


18 608 


C 


C 


C 


C 


C 


C 


T 


C 


C 


C 


19 609 


• 

G 


G 


G 


G 


A 


G 


G 


G 


G 


G 


20 983 


C 


C 


C 


C 


C 


C 


C 


C 


T 


C 


PS PS 


Isogene Coding Sequence Number(c) (Part 2) 






No.(a) Position(b) 


16c 


17c 


18c 
















14 131 


A 


A 


G 
















15 205 


C 


T 


C 
















16 271 


T 


C 


C 
















17 304 


A 


G 


G 
















18 608 


C 


C 


C 
















19 609 


G 


G 


G 
















20 983 


C 


C 


C 

















(a) PS = polymorphic site; 

(b) Position of PS in SEQ ID NO:2; 

(c) Alleles for the isogene coding sequence are presented 5' to 3' in each col umn ; the numerical 
portion of the isogene coding sequence number represents the number of the parent full FY 
isogene. 

26. A recombinant nonhuman organism transformed or transfected with the isolated polynucleotide 
of claim 25,*wherein the organism expresses a Duffy blood group (FY) protein that is encoded 
by the polymorphic variant sequence. 

27. The recombinant nonhuman organism of claim 26, which is a transgenic animal. 

28. An isolated fragment of a FY coding sequence, wherein the fragment comprises one or more 
polymorphisms selected from the group consisting of thymine at a position corresponding to 
nucleotide 205, thymine at a position corresponding to nucleotide 608, adenine at a position 
corresponding to nucleotide 609 and thymine at a position corresponding to nucleotide 983 in 
SEQ ID NO:2. 

29 An isolated polypeptide comprising an amino acid sequence which is a polymorphic variant of a 
reference sequence for the Duffy blood group (FY) protein, wherein the reference sequence 
comprises SEQ ID NO:3, except the polymorphic variant comprises one or more variant amino 
acids selected from the group consisting of phenylalanine at a position corresponding to amino 
acid position 69, isoleucine at a position corresponding to amino acid position 203, isoleucine at 
a position corresponding to amino acid position 203 and phenylalanine at a position 
corresponding to amino acid position 328. 

30. An isolated monoclonal antibody specific for and immunoreactive with the isolated polypeptide 
of claim 29. 
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31. A method for screening for drugs targeting the isolated polypeptide of claim 29 which comprises 
contacting the FY polymorphic variant with a candidate agent and assaying for binding activity. 

32. An isolated fragment of a FY protein, wherein the fragment comprises one or more variant 
amino acids selected from the group consisting of phenylalanine at a position corresponding to- 
amino acid position 69, isoleucine at a position corresponding to amino acid position 203, 
isoleucine at a position corresponding to amino acid position 203 and phenylalanine at a position 
corresponding to amino acid position 328 in SEQ ID NO:3. 

33 A computer system for storing and analyzing polymorphism data for the Duffy blood group 
gene, comprising: . 
(a) a central processing unit (CPU); 
a communication interface; 
a display device; 
an input device; and 

a database containing the polymorphism data; 
wherein the polymorphism data comprises any one or more of the haplotypes set forth in the 
table immediately below: 



(b) 
(c) 
(d) 
(e) 



10 


PS 


PS 


Haplotype Number(c) (Part 1) 














No.(a) 


Position(b) 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 




1 


2690 


C 


C 


C 


C 


C 


C 


C 


C 


C 


c 




2 


2864 


A 


G 


G 


G 


G 


G 


G 


G 


G 


G 




3 


2882 


A 


A 


A 


A 


A 


A 


A 


A 


A 


A 


15 


4 


2910 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 




•5 


2949 


C 


A 


C 


C 


C 


C 


C 


C 


C 


C 




6 


2980 


G 


G 


c 


G 


G 


G 


G 


G 


G 


G 




7 


2996 


C 


C 


c 


C 


C 


C 


C 


C 


C 


C 




8 


3259 


T 


T 


T 


C 


T 


T 


T 


T 


T 


T 


20 


9 


3470 


T 


T 


T 


T 


C 


T 


T 


T 


T 


T 




10 


3672 


C 


T 


T 


T 


T 


C 


C 


C 


T 


T 




11 


3707 


C 


T 


C 


C 


C 


C 


C 


C 


C 


C 




12 


3979 


G 


A 


A 


A 


A 


A 


G 


G 


A 


A 




13 


3997 


C 


C 


C 


C 


C 


C 


C 


T 


C 


C 


25 


14 


4140 


A 


A 


G 


G 


A 


G 


A 


A 


G 


G 




15 


4214 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 




16 


4280 


C 


C 


C 


C 


C 


C 


C 


c 


c 


C 




17 


4313. 


G 


A 


G 


G 


G 


G 


G 


G 


G 


G 




18 


4617 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


30 


19 


4618 


G 


G 


G 


G 


G 


G 


G 


G 


A 


G 




20 


4992 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 
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PS . 


PS 


Haplotype Number(c) (Part 2) 


■ 










35 


No.(a) 


Position(b) 


11 


12 


13 


14 


15 


16 


17 


18 


19 


20 




1 


2690 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 




2 


2864 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 




3 


2882 


A 


A 


A 


A 


A 


A 


A 


A 


A 


A 




4 


2910 


C 


C 


C 


C 


C 


C 


C 


C 


T 


T 


40 


5 


2949 


C 


C 


C 


C 


C 


C 


c 


c 


C 


C 




6 


2980 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 




7 


2996 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 




8 


- 3259 


* T 


T 


T 


T 


T 


T 


T 


T 


T 


T 




9 


3470 


T 


T 


T 


T 


T 


T 


T 


T 


T 


T 


45 


10 


3672 


T 


T 


T 


T 


T 


T 


T 


T 


T 


T 




11 


3707 


C 


C 


C 


T 


T 


T 


T 


T 


C 


T 




12 


3979 


A ' 


A 


A 


A 


A 


A 


A 


A 


A 


A 




13 


3997 


C 


C 


C 


c 


C 


C 


C 


C 


C 


C 






4140 


G 


G 


G 


A 


A 


A 


A 


G 


A 


A 


50 


15 


42 14 


C 


T 


T 


C 


C 


C 


Hp 

T 


C 








16 


4280 


C 


C 


C 


C 


C 


T 


C 


c 


c 


C 




17 


4313 


G 


G 


G 


A 


G 


A 


G 


G 


G 


G 




18 


4617 


T 


C 


C 


C 


C 


C 


C 


C 


C 


C 




19 


4618 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


55 


20 


4992 


C 


C 


T 


C 


C 


C 


C 


C 


C 


C 




PS 


PS 


Haplotype Number(c) (Part 3) 














No.(a) 


Positionfb) 


21 


22 


23 


















1 


2690 


C 


C 


T 
















60 


2 


2864 


G 


G 


G 


















3 


2882 


A 


G 


A 


















4 


2910 


T 


C 


C 


















5 


2949 


C 


C 


C 


















6 . 


2980 


G 


G 


G. 
















65 


7 


2996 


T 


C 


C 


















8 


3259 


T 


T 


T 


















9 


3470 


T 


C 


C 


















10 


3672 


C 


C 


T 












- 






11 


3707 


C 


C 


C 
















70 


12 


3979 


A 


A 


A 


















13 


3997 


C 


C 


C 


















14 


4140 


A 


A 


A 


















15 


4214 


C 


C 


C 


















16 


4280 


C 


C 


c 
















75 


17 


4313 


G 


G 


G 


















18 


4617 


C 


C 


C 


















19 


4618 


G 


G 


G 


















20 


4992 


C 


C 


C 

















80 (a) PS = polymorphic site; 

(b) Position of PS within SEQ ID NO: 1; 

(c) Alleles for haplotypes are presented 5 ' to 3 ' in each column; 
the haplotype pairs set forth in the table immediately below: 
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■ 

85 





PS 


PS 


Haplotype Pair(c) (Part 1) 












No.(a) 


Position(b) 


10/10 


21/21 


7/7 


14/14 


5/5 


12/12 


10/16 


7/1 


- 


1 


2690 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


90 


2 


2864 


GIG 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/A 




3 


2882 - 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 




4 


2910 


QIC 


T/T 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




5 


2949 


c/c 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




6 


2980 


GIG 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


95 


7 


2996 


C/C 


T/T 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




8 


3259 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 




9 


3470 


T/T 


T/T 


T/T 


T/T 


C/C 


T/T 


T/T 


T/T 




10 


3672 


T/T 


C/C 


C/C 


T/T 


T/T 


T/T 


T/T 


C/C 




11 


3707 


C/C 


C/C 


C/C 


T/T 


C/C 


C/C 


C/T 


C/C 


100 


12. 


3979 


A 1 A 

A/A 


A J A 

A/A 


G/G 


A 1 A 

A/A 


A/A 


A/A 


A/A 


G/G 




13 


3997 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


QIC 


C/C 




14 


4140 


G/G 


A/A 


A/A 


A/A 


A/A 


G/G 


G/A 


A/A 




15 


4214 


C/C 


C/C 


C/C 


C/C 


C/C 


T/T 


C/C 


C/C 




16 


4280 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/T 


C/C 


105 


17 


4313 


G/G- 


G/G 


G/G 


A/A 


G/G 


G/G 


G/A 


G/G 




18 


4617 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




19 


4618 


G/G 


G/G 


G/G 


G/G . 


G/G 


G/G 


G/G 


G/G 




20 


4992 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


110 


. PS 


PS 


Haplotype Pair(c) (Part 2) 












No.(a) 


Positionfb) 


10/14 


21/15 


5/8 


5/14 


10/6 


18/2 


15/19 


21/7 




1 


2690 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




2 


2864 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


■ 


3 


2882 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


115 


4 


2910 


C/C 


T/C 


C/C 


C/C 


C/C 


C/C 


C/T 


T/C 




5 


2949 


C/C 


C/C 


C/C 


C/C 


C/C 


C/A 


C/C 


C/C 




6 


2980 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 




7 


2996 


C/C 


T/C 


C/C 


C/C . 


C/C 


C/C 


C/C 


T/C 




8 


3259 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


120 


9 


3470 


T/T 


T/T 


C/T 


C/T 


T/T 


T/T 


T/T 


T/T 




10 


3672 


T/T 


C/T 


T/C. 


T/T 


T/C 


T/T 


T/T 


C/C 




11 


3707 


C/T 


C/T 


C/C 


C/T 


C/C 


T/T 


T/C 


C/C 


• 


12 


3979 


A/A. 


A/A 


A/G 


A/A 


A/A 


A/A 


A/A 


A/G 




13 


3997 


C/C 


C/C 


C/T 


C/C 


C/C 


C/C 


C/C 


C/C 


125 


14 


4140 


G/A 


A/A 


A/A 


A/A 


G/G 


G/A 


A/A 


A/A 




15 


4214 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




16 


4280 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




17 


4313 


G/A 


G/G 


G/G 


G/A 


G/G 


G/A 


G/G 


G/G 




18 


4617 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


130 


19 . 


4618 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 




20 


4992 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 
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PS 


PS 


Haplotype Pair(c) (Part 3) 










135 


No.(a) 


Position(b) 


5/21 


10/13 


10/1 


14/1 


10/21 


10/18 


18/15 


10/7 




1 


2690 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


. C/C 




2 


2864 


G/G 


G/G 


G/A 


G/A 


G/G 


G/G 


G/G 


G/G 


- 


3 


2882 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 




4 


2910 


C/T 


C/C 


C/C 


C/C 


C/T 


C/C 


C/C 


C/C 


140 


5 


2949 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




6 


2980 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 




7 

* 


2996 


C/T 


C/C 


C/C 


C/C 


C/T 


C/C 


C/C 


C/C 




8 


3259 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 




9 


3470 


C/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


145 


10 


3672 


T/C 


T/T 


T/C 


T/C 


T/C 


T/T 


T/T 


T/C 




11 


3707 


C/C 


C/C 


C/C 


T/C 


C/C 


C/T 


T/T 


C/C 




12 

X dm 


3979 


A/A 


A/A 


A/G 


A/G 


A/A 


A/A 


A/A 


A/G 




13 

x ^ 


3997 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




14 


4140 


• A/A 


G/G 


G/A 


A/A 


G/A 


G/G 


G/A 


G/A 


150 


15 


4214 


C/C 


C/T 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




16 


4280 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




17 


4313 


G/G 


G/G 


G/G 


A/G 


G/G 


G/G 


G/G 


G/G 




18 

X KJ 


4617 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




19 

X s 


4618 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


155 


20 


4992 


C/C 

* 


C/T 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


■ 


PS 


PS 


Haplotype Pair(c) (Part 4) 


- 










No fa^ 


Positionfb^ 


5/23 


10/5 


10/12 


5/22 


5/7 


7/22 


21/14 


10/3 




1 


2690 


C/T 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


160 


2 


2864 


G/G 


G/G 


G/G 


G/G 


G/G 

V^J / XJ 


G/G 


G/G 


G/G 




3 


2882 


A/A 


A/A 


A/A 


A/G 


A/A 

X JJ X X 


A/G 


A/A 


A/A 




4 


2910 

mm^ A 


C/C 


C/C 


C/C 


C/C 


C/C 


QIC 


T/C 


C/C 




. 5 


2949 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




6 


2980 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/C 


165 


7 


2996 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


T/C 


C/C 




8 


3259 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 




9 


3470 


C/C 


T/C 


T/T 


C/C 


C/T 


T/C 


T/T 


T/T 




10 


3672 


T/T 


T/T 


T/T 


T/C 


T/C 


C/C 


C/T 

^*mf 


T/T 




11 


3707 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/T 


C/C 


170 


12 


3979 


A/A 


A/A . 


A/A 


A/A 


A/G 


G/A 


A/A 


A/A 




13 

X 


3997 


C/C 


C/C 


C/C 


C/C . 


C/C 


C/C 


C/C 


C/C 




14 


4140 


A/A 


G/A 


G/G 


A/A 


A/A 


A/A 

Ami XX. 


A/A 

AmJ AA 


G/G 




15 


4214 


C/C 


C/C 


C/T 


C/C 


C/C 


C/C 


C/C 


C/C 




16 


4280 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


175 


17 


4313 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/A 


G/G 




18 


4617 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 




19 


4618 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 




20 


4992 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 



180 
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185 



190 



195 



200 



PS 


PS 


Haplotype Pair(c) (Part 5) 










No.(a) 


Position(b) 


10/9 


10/11 


20/2 


10/20 


20/7 


10/17 


10/4 


5/18 


1 


2690 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


2 


2864 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


3 


2882 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


4 


2910 


C/C 


C/C 


T/C 


C/T 


T/C 


C/C 


C/C 


C/C 


5 


2949 


C/C 


C/C 


C/A 


C/C 


C/C 


C/C 


C/C 


C/C 


6 


2980 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


7 


2996 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C . 


C/C 


8 


3259 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/C 


T/T 


9 


3470 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


T/T 


C/T 


10 


3672 


T/T 


T/T 


T/T 


T/T 


T/C 


T/T 


T/T 


T/T 


11 


3707 


C/C 


C/C . 


T/T 


C/T 


T/C 


C/T 


C/C 


C/T 


12 


3979 


A/A 


A/A 


A/A 


A/A 


A/G 


A/A 


A/A 


A/A 


13 


3997 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


14 


4140 


G/G 


G/G 


A/A 


G/A 


A/A 


G/A 


G/G 


A/G 


15 


4214 


C/C 


C/C 


C/C 


C/C 


C/C 


C/T 


C/C 


C/C 


16 


4280 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


17 


4313 


G/G 


G/G 


G/A 


G/G 


G/G 


G/G 


G/G 


G/G 


18 


4617 


C/C 


C/T 


C/C 


C/C 


• C/C 


C/C 


C/C 


C/C 


19 


4618 


G/A 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


G/G 


20 


4992 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 


C/C 



(a) PS = polymorphic site; 
205 (b) Position of PS in SEQJD NO: 1; 

(c) Haplotype pairs are represented as 1 st haplotype^ 1 * 1 haplotype; with alleles of each 
haplotype shown 5' to 3' as 1 st polymorphism/2 nd polymorphism in each column; 

• * 

and the frequency data in Tables 5 and 6. 
34. A genome anthology for the Duffy blood group (FY) gene which comprises two or more FY 

> 

isogenes selected from the group consisting of isogenes 1-23 shown in the table immediately 
below, and wherein each of the isogenes comprises the regions of SEQ ID NO:l shown in the 
table immediately below and wherein each of the isogenes 1-23 is further defined by the 
5 corresponding sequence of polymorphisms whose positions and identities are set forth in the 

table immediately below: 
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Region 


PS 


PS 


Isogene 


Number(d) (Part 1) 














Examined(a) No.(b) 


Position(c) 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


10 


2486-3779 


1 


2690 


C 


C 


C 


C 


C 


C 


C 


C 


C 


c 




2486-3779 


2 


2864 


A 


G 


G 


G 


G 


G 


G 


G 


G 


G 




2486-3779 


3 


2882 


A 


A 


A 


A 


A 


A 


A 


A 


A 


A 




2486-3779 


4 


2910 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 




2486-3779 


5 


2949 


C 


A 


C 


C 


C 


C 


C 


C 


C 


C 


15 


2486-3779 


6 


2980 


G 


G 


c 


G 


G 


G 


G 


G 


G 


G 




2486-3779 


7 


2996 


C 


C 


c 


C 


C 


C 


C 


C 


C 


C 




2486-3779- 


8 


3259 


T 


T 


T 


C 


T 


T 


T 


T 


T 


T 




2486-3779 


9 


3470 


T 


T 


T 


T 


C 


T 


T 


T 


T 


T 




2486-3779 


10 


3672 


C 


T 


T 


T 


T 


C 


C 


C 


T 


T 


20 


2486-3779 


11 


3707 


C 


T 


C 


C 


C 


C 


C 


C 


C 


C 






1 9 


3979 


G 


A 


A 


A 


A 


A 


G 


G 


A 


A 




3872-5302 


13 


3997 












C 


C 


T 


C 


C 




3872-5302 


14 


A\ Ad 


A 


A 


G 


G 


A 


KJ 


A 

A 


A 




Li 




3872-5302 


15 


4214 


C 


C 


C 


C 


C 


c 


c 


c 


c 


c 


25 


3872-5302 


16 


.4280 


C 


C 


C 


C 


C 


c 


c 


c 


c 


c 




3872-5302 


17 


4313 


G 


A 


G 


G 


G 


G 


G 


G 


G 


G 




3872-5302 


18 


4617 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 




3872-5302 


19 


4618 


G 


G 


G 


G 


G 


G 


G 


G 


A 


G 


• 

30 


3872-5302 


20 


4992 


C 


C 


C 


C 


C 


C 


C 


C 


c 


C 




Region 


PS 


• . 
PS 


Isogene Number(d) (Part 2) 














Examined(a) No.(b) 


Position(c) 


11 


12 


13 


14 


15 


16 


17 


18 


19 


20 




2486-3779 


1 


2690 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 




2486^3779 


2 


2864 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


35 


2486-3779 


3 


2882 


A 


A 


A 


A 


A 


A 


A 


A 


A 


A 




2486-3779 


4 


2910 


C 


C 


C 


C 


C 


C 


C 


C 


T 


T 




2486-3779 


5 


2949 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 




2486-3779 


6 


2980 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 




2486-3779 


7 


2996 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


40 


2486-3779 


8 


3259 


T 


T 


T 


T 


T 


T 


T 


T 


T 


T 




2486-3779 


9 


3470 


T 


T 


T 


T 


T 


T 


T 


T 


T 


T 




2486-3779 . 


10 


3672 


T 


T 


T 


T 


T 


T 


T 


T 


T 


T 




2486-3779 


11 


3707 


C 


C 


C 


T 


T 


T 


T 


T 


C 


T 




3872-5302 


12 


3979 


A 


A 


A 


A 


A 


A 


A 


A 


A 


A 


45 


3872-5302 


13 


3997 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 




3872-5302 


14 


4140 


G 


G 


G 


A 


A 


A 


A 


G 


A 


A 




3872-5302 . 


15 


4214 


C 


T 


T 


C 


C 


C 


T 


C 


C 


C 




3872-5302 


16 


4280 


C 


C 


C 


C 


C 


T 


C 


c 


C 


C 




3872-5302 


17 


4313 


G 


G 


G 


A 


G 


A 


G 


G 


G 


G 


50 


3872-5302 


18 


4617 


T 


C 


C 


C 


C 


C 


C 


C 


C 


C 




3872-5302 


19 


4618 


G 


G 


G 


G 


G . 


G 


G 


G 


G 


G 




3872-5302 . 


20 


4992 


C 


C 


T 


C 


C 


C 


C 


C 


C 


C 
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55 


Region 


PS 


PS 


Isogene Numb 




Examined(a) No.(b) 


Position(c) 


21 


22 


23 




2486-3779 


1 


2690 


C 


C 


T 




2486-3779 


2 


2864 


G 


G 


G 




2486-3779 


3 


2882 


A 


G 


A 


60 


2486-3779 


4 


2910 


T 


C 


C 




2486-3779 


5 


2949 


C 


C 


C 




2486-3779 


6 


2980 


G 


G 


G 




2486-3779 


7 


2996 


T 


C 


C 




2486-3779 


8 


3259 


T 


T 


T 


65 


2486-3779 


9 


- 3470 


T 


C 


C 




2486-3779 


10 


3672 


C 


C 


T 




2486-3779 


11 


3707 


C 


c 


C 




3872-5302 


12 


3979 


A 


A 


A 




3872-5302 


13 


3997 


C 


c 


C 


70 


3872-5302 


14 


4140 


A 


A 


A 




3872-5302 


15 


4214 


C 


C 


C 




3872-5302 


16 


4280 


C 


c 


C 




3872-5302 


17 


. 4313 


G 


G 


G 




3872-5302 


18 


4617 


C 


C 


C 


75 


3872-5302 


19 


4618 


G 


G 


G 




3872-5302 


20 


4992 


C 


C 


C 



(a) Region examined represents the nucleotide positions defining the start and stop positions 
within SEQ ID NO: 1 of the regions sequenced; 
80 (b) PS = polymorphic site; 

(c) Position of PS within SEQ ID NO: 1; 

(d) Alleles for FY isogenes are presented 5' to 3' in each column. 
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POLYMORPHISMS IN THE FY GENE 

GGCAAAGGTT GGGAGTGGCT TTTCCTCTGG TAGCCACACA CCTGAGCACT 

ACGGACAGGG AGGCAGGTGC CACCTTGACA CCTCTCTTCC ATAGCAATGG 100 

GAAAGTGATG AGTGCGGGAG TCCTGAGGAG ATGTGGCCTG CAGACAACAT 

GCAGCCATGC AGGGACCCAG GACTGTAACC TGGGGAGGAC GCGGGTCCCT 200 

GCAAGGAAGA GTAGATTTGG AGAGGAAGGA TGGAGGTGGA CTCTCACCCC 

ATTCCCCCCG GAAATGAACA AAGCCGGGCC CTTTCCATAG GAACTGCCCT 300 

TGGAGATAGC AGAGTGTGGC TGCCCCTCCT TGCTCCAGCA GCAGTGGGAG 

AGGCACTGCT CTGGGGCCTG AACTGCCTCT GCTTCCCCCC CTGAGGGGCC 400 

CCTCACTCTT ACCCAAGACT CTGGATTGTT GCACGGCAAC CACTCCTCCC 
ATGGCATTGC TCAGCAACTA CTTCTCCCTT CCCGGCCACC. CTGTGCCCCC 500 

TTCCTGGTCC CAACGCCAGC CCTTCATCCT TCCTCCCTCA GCAGCCAGGC 

AGACATAACA ACAAAACTAC TAAAAGGAGC TTCACTGCAG TGAGCTGTTT 600 

CCTGCCCAAA CTAAGGGAAT AAT GTGAACT GTGTGCATGT GTGTGGTGTG . 

TATGCATGTG TGCATGTGTG TGTGTGTGTG TGCATGTGTG TGAGTGAGTG 700 

AGAGGCAGAG CGAGGAACTG AGGAGGAGGG CTAAGAGCGA GGGGTCCTGG 

GCAAGTGGAC AGGGCTGTGG GACATGTTGG GGAGGCTTTG GGAATGGGGT 800 

ATTCCTAGTC AGGGTTCACA CCTCACCTGG GATGTTGTTC CATGCTGGTA 

TTTCCTCTGC CACCCCCAAT GCCCATCGGT CTTGGAGAAA GGAGTCCCCG 900 

GGTGTGTGTT TGCCCAGCTG TCCATTCTAT CTCTCCCTTA AACACAGAGC 

ATTCAGCCCT TCCCTGGATT TCCCTCCTCT GAGCCATGGA GTCAGTGCCA 1000. 

CAGCCTTTGC TATGCACCTC TCAGGCCTCT CCTTGGCGTT GACCCTGGAA 

AGACCTACCA CCACCTATTT TTTCCCATAG TCTGTACCCA GTGAGTTGAA 1100 

GGCTGGGTCC CCACCCTTCC TTTTGATTTC CTGTCTTCCT TCTCGTGGCC 

CCAGCTGGTT GCTGTGGAGA TGAGGTTCCT GGTCCTCCCT GTCCTGGCTG 1200 

GACTGCCCCG CCTCAGATCC AGGATGCCCT TGGCATCGCT CCCACCCTCC 

CCCAGCTTTT CCTCCCTGGT CTGACAATGG GCATGCAAAA AGGGGCAGCT 1300 
GCAATCTAGC AGGCCTGCCC ACCCCCTTCA GTTCAGGTAA TACAGTTGTG 

AATCTTCCAG CCGCTGGTTA GGGCCTTGGG • CACCACAGGC AGCCCCTCAC 1400 

CTAAGCCGGG GCCTACTCCT C T T AC AAC AG CAAGAGAGCC CTGGGGCCCC 

AGGCCTGTTG AGCTTCTTGT CTCCCAGCAC CCGCTTTTGG GAAAATGACT 1500 

TTTCCTCTTC AAGCTGAACC ACTCTGTCCA TATTACACAG AAGCCATATT 

TGTACGGGGG GGTGGGAGGG AGAGGGGCTG TTGTGCTGTG TGTGTCTGTC 1600 

CAGGGGTGGG GGGGTGGGGG AAGGGAGCAG GGAGGGGACC GTGTATCTTT 

ATAATCTTTC TAACTCTCCT GTGCTAATCT CAGAGGGGTC ACCCTCAATA 1700 

TATCTGGATT ATCCGTGTCA TTCAGCTGCC TCCTTTCTGG TCCTCTTGCT 

GCTGCTGGGA TGTGTGTATG TGAGGGTCTT CTTCCCATAC CCCTTGCACC 18 00 

TGGTGCCTGG TGCCTCAAAA GGTGGTGTGT CCCTTGCCAG GCCACTCTCA 

AGAATATCTA T G T AC AG CAA C AAT AT AAC T C T AC AAGGGA GAGAAGTGTG 1900 

TTCACTTCCT TTTGCTAAGC CCTTCCTTTC CAGAGAGTGT CTTGGGGGGC 

ATCTGACTGC TTCCCCCCAC CCTCTGCCAG GCATTGCTGG AG AAT G T T AA 2000 

GACGGCGATG GAGATGCCAT CAACCCCACC CTGCAGAGCA TCACCAGACA 

CCACCAGACC AAATTCACTT TCCAGCCCCT TCATGTTGAA CCTGAAACTT 2100 

GAGCTAGTGT CTTGGGAGAA AAGGGGGAAA TCTCTACGAG GTACCCATCC 

TTCTGCACCT TAGGTCTGAG GTGCTTGGCC CCCTAGGAAG CCCTACATGA 2200 

ATGGGACAGA AGGTCCTTAA CAACACTGGA G ATGAAG C AG CCGATGCTGT 

TTTGGACAAA TGAAACAGCG TCCCCTAACC AGCCCTTTCT ATCTCATTGT 2300 

TCTGACTTGG ACACGCCATG GCTCACCGCT CCCAAAGTCC CCACTATGTC 

TCCCTAGCTG AGGAAATAAA AGCAGAGAGG GGTGATGAAA CAGTGACGAT 24 00 

CCTGGGGAAA CAGCTGAGGA GGGGAGGGAG GGGGAAGAAG CCACTAAAAA 

FIGURE 1A 



WO 02/30950 



2/6 



PCT/US01/42725 



AGTGAAATGT 
CTCTCTGCTG 
AGATGCCTGG 
AAT AT GAATA 
TGTCCCATTG 

ACTTCATATG 
CCCAATAGCT 
AACCCTCCCA 
CACCCAGACA 

CACAGCAAAC 

T 

TGCACACACC 

AAACACATGG 
GAGGAGCAGT 
CCTTATCTCT 
GCTTGAAGAA 
TTTCCACTTC 
ATCTCCCTTT 
C 

TCCTGAGTGT 
CAAGGCCAGT 
GGGTAAGGCT 
CATTAGTCCT 

AGCCCTTCTG 
GCACAGGGTG 
CTCATTTCCC 
TCCTCATCTT 

TTCCTTCTCT 
T 

CCTTTGCCTT 
GACCTTGCAC 
TTCCTGCTCC 
CACTGCATCT 
TCTGTCCTCC 



GCTTGGGAGA ATCGGCCTGC 
GCCAGCTCTG CCCCTCAGTG 
CCAATGAAAC . AGTTCCAGAG 
GAAAT CACCC TGTGGGCAAT 
TCCCCTAGAG CCTACTTTAA 



CAAGAGGCAT 
CCCTGAAATG 
CTCTCACACT 
CCCGCCAAGC 
A 

GTACACAGAG 

CCTCAGTTGG 

CTTTTGAACT 
GAGAGT CAGC 
GCCTCACAAG 
TCTCTCCTTG 
GGTAAAATCT 
CCACTTCGGT 

AGTCCCAACC 
GACCCCCATA 
TCCTGATGCC 
TGGCTCTTAT 

C 

TCTGCGGGCC 
AGTATGGGGC 
CTGCTGTTTG 
TTCTCCCTTC 

CCTTCCTATG 

TGAGTCAGTT 
TGCTCCTCCA 
GGCTCTTCAG 
GACTCCTGCA 
CCTCCCACCT 



GTAACTCTGA TGGCCTCCTC 

[exon 1: 4010.. 
CTCAACTGAG AACTCAAGTC 
CCTATGGTGT- GAATGATTCC 

GAAGCAGCTG CCCCCTGCCA 
GCCCTTCTTC ATCCTCACCA 

T 

TCCTCTTCAT GCTTTTCAGA 

■ 

TGGCCTGTCC TGGCACAGCT 

A 

GGTGCCCGTC TTGGCCCCAG 



GCACTGAGCC 
GCTTCATTAT 
GAAACACCCA 
CCCTCACATA 

TTCAGTACAC 

GACAGAGTTG 

C 

GCCTTTCCTT 
CGCCCTTCCA 
TCACCCAGCC 
CTGGAAAGCC 
CTACTTGCTG 
AAAATGCCCA 

AGCCAAATCC 
GGCCTGAGGC 
CCCTGTCCCT 
CTTGGAAGCA 

T GAAC C AAAC 
CAGGCCCCAG 
CCCCTCAGTC 
CCGCTTTTTT 
T 

CTAGCCTCCT 

CCATCCTGGT 
GCCCCAGCTG 
GCTCCCTGCT 
GAGACCTTGT 
GCCCCTCAAT 
G 

TGGGTATGTC 

AGCTGGACTT 
TTCCCAGATG 

CTCCTGTAAC 
GTGTCCTGGG 

CCTCTCTTCC 

T 

GGCTGTGGGC 



CTGCAGGGTA 
AGAAACTTTA 
ACTTTATGTC 
GGTCCCATTT 
CTTGTCAGAC 

T 

CATAGGTGGC 
GCAGCCTCGA 
GACCTAGAGA 
CAGATATGTG 
G 

ACAAAG AG C T 

ACCACCACCA 

GGATCCAGTT 
CTCCAATTTC 
CCCCTCTCTT 
CCCTGTTTTC 
GAAAGCCCCC 
CTTTCTGGTC 

AACCTCAAAA 
TTGTGCAGGC 
GCCCAGAACC 
CAGGCGCTGA 

GGTGCCATGG 
AGTCCCTTAT 
TTTATATCTC 
CCTCTTCCTT 



GATGCCCTTT 
CATATTGCTA 
CCCCAGTAGA 
TAAAAT AT GC 
CATGTATTCC 

TAGGCAAACA 
CAGCCACCCC 
TAGCTAGACA 
CACAATGATA 

CACGCCCACG 
A 

CCTTTCTCCC 
T 

CAAGGGGATG 
CCAGCACCTC 
CCTTCCTTGT 
TCAAT CTCCC 
TGTTTTCTCA 
CCCACCTTTT 

CAGGAAGACC 
AGTGGGCGTG 
TGATGGCCCT 
CAGCCGTCCC 

GGAACTGTCT 
CCCTATGCCC 
TTCCTTTTCC 
CAAAGTCTTT 



AGCTCCCTCT TGTGTCCCTC 



CTCTTGGTGC 
CCCTGGCTTC 
TTGTCCTTTT 
TCTCCCACCC 
TCCCAGGAGA 

CTCCAGGCGG 

CGAAGATGTA 
GAGACTATGA 

G 

CTGCTGGATG 
TATCCTAGCT 



CTTTCCTTCT 
CCCAGGACTG 
CCACTGTCCG 
GACCTTCCTC 
CTCTTCCGGT 
T 

AGCTCTCCCC 

TGGAATTCTT 
TGCCAACCTG 

ACTCTGCACT 
AGCAGC AC T G 



2500 
2600 
2700 

2800 
2900 



3000 

3100 
3200 
3300 

3400 
3500 

3600 
3700 



3800 
3900 
4000 



GCTGGCAGCT CTGCCCTGGC 



AGTGCCCTCT TCAGCATTGT 



4100 



4200 



4300 



GGCTAGGTAG CACTCGCAGC TCTGCCCTGT 



4400 
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GTAGCCT GGG CTACTGTGTC TGGTATGGCT 
CTGCTAGGGT GCCATGCCTC CCTGGGCCAC 
CCCAGGCCTC ACCCTGGGGC TCACTGTGGG 
TACTGACACT GCCTGTCACC CTGGCCAGTG 
ACCCTGATAT ACAGCAC GGA GCTGAAGGCT 

TA 

AGCCTGTCTT GCCATCTTTG TCTTGTTGCC 
AGGGGCTGAA GAAGGCATTG GGTATGGGGC 
CTGTGGGCCT GGTTTATTTT CTGGTGGCGT 
GGATTTCCTG GTGAGGTCCA AGCTGTTGCT 
AGCAGGCTCT GGACCTGCTG CTGAACCTGG 
CACTGTGTGG CTACGCCCCT GCTCCTCGCC 
CCGCACCCTC TTGCCCTCTC TGCCCCTGCC 

TGGACACCCT TGGAAGCAAA TCCTAGTTCT 

. .5026] 

ATTAAAGTCT ACACTGCCTT TGTGAAGCGG 
GGGAGAAGAA GG AG AAT GGA GAGAGAGACA 
GCCAGTGTCT GCTTCTATAG CTGGCTTGGG 
AATACCCTCA GGGTACACAG ATGTTCTCTT 
TCTCAAGGGA GAAGAGAAGA GGAACCAGAG 
CAAAAAAAAC AG AAGG GAT G GCTTAGCTGG 
GCAAATGGAA TAGGAACTCA AACTGAGAGA 
CAAAGCCCAG AGCAATACCA CCTCCCCCTG 
TCTTCTGTCT CCTCTCTGGC TTTGTTTAGT 
AGGTGAAAGA AGCATCCCAG GGGATGTTAC 
AGGTAATTTA AAAAGCCACT . TCCTGGGAGT 
GCATGACCTG AATGTGCGTG CGTGTGTGTG 
TGTTTCTCGA TCTGTTAGAA TCTACCTTTA 
AAACATATGT CCACCCATGA GCTTGCATCT 
CACACCTGTG CGTGTGCACT GACTTTTCTC 
ATTCTGCACT CATCCCTGTT C AC AG GAT AT 
ACTCCTTACC CAAATGAGTT TTCTTTACCC 
TTCTGTGTAG GATGTGTGGA GGGAAGAAAA 
TGGAGAAACT TGAAGGGGGA GGCCCTGATT 
ATTCCCCGAA TTTCCCTTTC AGAATCTCAG 
TTCCCACATA CATCTTTCCT TCCACCTTCC 
GGGCACCTTT TTCCCAACCC CTGATTCTCT 
TGAGATTTTT CTCAGTCTCT ACCTACCCAA 
AGAAACCCCT CCTCATCAGG GGCACAGCTT 
ACCCTCTACC CAAGAGGCTA CAAAACAGT T 
CTAAAGGCTG GGGAAACTTG AGCAGATACG 
TTACCATCTT ACCATTTTCC AAAG AT AT G C 
AAATGTTTCT GCTTGACTCT CTGGGCTTGG 
AGAGGTGCAG AGATGAGTTA GAATAGCTTA 
TTAGGGAATT TTCCTGGGTG GGTGCCACGA 
CCTCCTGTCT CTTAGCAACC ACCAGGTTAG 
AATTGAAAGG CGGGATTTAG GGACCGATTG 
AACAGAAAGG AAGG GA GAGA AAAT GAAGAG 
CTAAATTATG CTCTGGTTTC CAACCACAGT 
TTTTTCCCCC GCTTTTTTTT TTCCAGGCTT 
TCCTTGACCA CTCTTGCAAT TCTACCAGAT 
GGTACTGATT TGGAAGCTGA CCTAGTTGAG 



CAGCCTTTGC CCAGGCTTTG 

AGACTGGGTG CAGGCCAGGT 4500 

AATTTGGGGA GTGGCTGCCC 

GTGCTTCTGG TGGACTCTGC 4600 

TTGCAGGCCA CACACACTGT 

ATTGGGTTTG TTTGGAGCCA 47 00 

CAGGCCCCTG GAT G AAT AT C 
CATGGGGTGG TTCTAGGACT ' 4800 
GTTGTCAACA TGTCTGGCCC 
CAGAAGCCCT GGCAATTTTG 4 900 

CTATTCTGCC ACCAGGCCAC 
TGAAGGATG'G TCTTCTCATC 5000 

T 

CTTCCCACCT GTCAACCTGA 

GTGGTTTCTT ATTTTGTCTG % 5100 
TTTTTATGTC AGACTTTCTT 
AAGAAGGTGA ATGATGAATA 52 00 

GAGGTGTGGG GTCACGGCCA 
CATGAGGGGA GTCATTAAAC 5300 
AAAAAAAGCT GTTCTGGGAA 
TAAACAGTGA AGAGTGATGA 54 00 

TCCAACCTGC CCAGCCTCTG 
GATTAGGACA GTGGTGGGGA . 5500 
TCAGTTCAGG G AAC AT AT C A 
CATCTCTCCC AGGTTCCTCA 5600 
TGTGTGTGTG TGTACACATC 
TGTTAGATGT ATGCATGTAA 5700 
CTGTCAGCAC CTGAACTGCG 
AGGACCCAAA CCCCCACTCA 58 00 

AGAATCGGGA TTTATGACTC 
TGGTTTTTAA GCCTAGTCTT 5900 
GATCAAGAAG TTGTGAAGGG 
TGATTCATCT TCTGCTTGGA 6000 
CTTTTGAAAT AAACCTTTAT 
AC AC AAT AC C CCAATCCCCT 6100 
GGCTGCTTAA TCATGACCTT 
GTTTAGATGG CTGGAAGGAC 62 00 

TTACCACCAA GAGCAAATTC 
AGTTCCTACC TCTAACCCAA 6300 
TTCTATCAGT TTGAACCCAA 
TATACCTGGT TTCTTTACTA 64 00 

GAATAGTAGG CGAGTGCGGG 
GGCAGGAGGG TGCAAAAGGC 6500 
CAAGGCCTCT AAATCTCCCA 
CTCCTGATTG GTTCGTCCTC . 6600 

AGACGCGGGA G AC AT T C T G A 
AAAGGAAATA ATTTACAAAC 67 00 

TCATGAATGT GTTCTAGTAT 
CTCTCAATAT CCCCCTCCCG 6800 
GTTGCTGTCC TCCCTTACAA 
GGGGAGGAGA GGGCGTTTTT 6900 
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GACTCCCTGA ATCTTCCAGT GTCAACCTGA TGCAAGGGAG GCTTAATTTA 
AGACCAGTAG GCTTGTCTTA TCTGCCCCCA ACCCTGTGCC TCTGGATAGA 7000 
AATCCCTGGT CAGTCAGTCC AGT TAGAGAG AACCCCAGAC TCCTGGGTAA 
TAGCTTGGCA GCTCTCATGG CTTTCACAAG GGAAAGGCAG CTGCAGAAGC 7100 
CCGAAGCTGC TAAGAGGTTA GGGTGGGCTG GAGACAGTGC CCTACCCCCG 
CCCCCTGCTA CATCCTCCTC ATCCCCACCC CCACCGGGAT TGCTCCAGGC 7200 
CTTTTGGGCT GCCCTTTCCC TGCCATTACC TAGGCAGCAC TTGGAGAGCT 
CCTCCTTAAG TCTAACCCGG ACCTCAGTCA TTTCTTTAAA GCTTTCTTGG 7300 
GGACCTGCCA CCCCATGCAT TTAACCCACT GCATGCCATC AACCACTCTA 
AAATTGGTCT GAGTCTGGCA TCTTTTCTGC AACCCTTCAG GAATACAAAT 7400 
CCTGTCTCCT TAAAGCCCTT AAGAATTTAA TCTTAGGGTT GGCAGGGACT 
TTAGCTGTGT AT GAG AT ATT GGGCATCCTA GCTAAAGAAA AAAATCCTCT 7500 
CAGAAAGATG AGAGCCAGGG AAGCAAGCTC TTGGGAAAAC ACAGGACCCT 
GAGGAAGGTC AGTTTGCTTT GCTTTCTAAA GGAGAGAGAT CTATTATTCA 7 600 

AGGGAAGTTT GAACAT CAC A TTGACGCTCA TAGTTCATTT ATTCCAAGCT 
GAGGCCCCTC CCTTAGGATT TAGAAAACAA ATACTTGGTC CTCACACCCT 7700 
TTTTCCATTC CTATTTCCCT ATCCCCCAAC CCCATCACCA CCTTCCTCCC 
TCAGAGGAAT TCTGATTGAG AACTTCACTG GGATTTCAAA CCCAATTCAT 7800 
CGCCAACTCT AATTGCCAGA GATTTGCATG AAAACCATCG TATGCTATCT 
AATTATTCTG ACAAC AG C AG CCCGCCGTCT GGGCACAAGG AGAATCGGAG 7900 
TTTTAATTAA CAATAATGCA CCTTGCTGAC GAATGCGACT GTTTAGGTTA 
ATTAACAAGT CCAAGTCCTT CCAAATCATC TCTAGACATC TAGGTGATTT 8000 
GGGCAGGAAG GGTGTGGGGA ACACAGGGAG GGATGGGGAG TGTTTAAGCA 
TCATTTCTGC AAAAAT GCAC GTTAGCTTTC TTCTTTCCTG TAACTATTTG 8100 
GTGAAGGGAA G AGAAAC T C T CTAAGAGACT GGCTCTGGAA AATTGGTTGG 
GGGATTTTGA GAACAT C TT C TTTTTTTTTT TTTTTTTTTG AGACAGAGTC 8200 
TCACTCTGTT GCCCAGGCTG GAGTGTAGTG GTGCAATCTT GGCTCACTGC 
AACCTCCGCC TCCCAGGTTC AAGTGATTCT CCTGCCTCAG CCTCCTGAGT 8300 
AGCTGGGATT ACAGGTGTGC ACCACCACGC CAGGCTAATT TTTTGTATTT 
TTAGTAGAGA CGGGGGGGTC TCACCAGTTT GGCCAGCCTG GTCTCGAACT -8400 
CTGACTTCAG GTGATCCACC TGCCTCAGCC TCCCAAAGTG CTGGGATCAC 
AGGCGTGAGC CACCGCGCCC GGCGGGAACA TCATTTTAAG GGGATGTATC 8500 
AGACATCTTT ATGTTGCACT TAGATTTAGG AAATCTTTTG GAT AC AT T T T 
TATAAATGAG AAGATTAAGT TCTTATAGCT CTCTAGTATC TCAAAATCAT 8600 
TGCCTGATTG TTTGCAAACT TGGTTTCTAG CAT GAAAGTC TCAACTTCCC 
CATCAATGCC ATTTGTCCTC AGCTTTCTCT ATATGTTCCT ACCACATCTG 8700 
TGGTCATTTA AAGTTGCCTA CTGCTTGTGA ACCCGGGAGG TGGAGCTTGC 
AGTAAGCCGA GATCGCGCCA CTGCACTCCA GCCTGAGCGA C AGAGT G AG A 8800 
CTCCATCTCA AAAAAAAAAA AAAAAAAAGT TGCCTACTGC CTTTGGTTTC 
CCAGATAACG TGTCAAGTTT CACCCTTGCC CTCTTCAAAG ATAACTGTAT 8 900 

TTTTTTTTCC TGGGTAGTTC TCCGTATCAT GCAAAAATAC ATTGTATGTA 
GCTCCAAACT GTACCTTTCA TCTTTCTAGT CTTTCTAAGA GCATGGACCT 9000 
AGTCTTTTTC CTCTAAATAG GGTAT 9025 
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POLYMORPHISMS IN THE CODING SEQUENCE OF FY 

j 

ATGGCCTCCT CTGGGTATGT CCTCCAGGCG GAGCTCTCCC CCTCAACTGA 
GAACTCAAGT CAGCTGGACT TCGAAGATGT ATGGAATTCT TCCTATGGTCS 100 
TGAATGATTC CTTCCCAGAT GGAGACTATG ATGCCAACCT GGAAGCAGCT 

G 

GCCCCCTGCC ACTCCTGTAA CCTGCTGGAT GACTCTGCAC TGCCCTTCTT 200 
CATCCTCACC AGTGTCCTGG GTATCCTAGC TAGCAGCACT GTCCTCTTCA 
T 

TGCTTTTCAG ACCTCTCTTC CGCTGGCAGC TCTGCCCTGG CTGGCCTGTC 300 

T 

CTGGCACAGC TGGCTGTGGG CAGTGCCCTC TTCAGCATTG TGGTGCCCGT . 
A 

CTTGGCCCCA GGGCTAGGTA GCACTCGCAG CTCTGCCCTG TGTAGCCTGG 400 
GCTACTGTGT CTGGTATGGC TCAGCCTTTG CCCAGGCTTT GCTGCTAGGG 
TGCCATGCCT CCCTGGGCCA CAGACTGGGT GCAGGCCAGG TCCCAGGCCT " 500 
CACCCTGGGG CTCACTGTGG GAATTTGGGG AGTGGCTGCC CTACTGACAC 
TGCCTGTCAC CCTGGCCAGT GGTGCTTCTG GTGGACTCTG CACCCTGATA 600 
TACAGCACGG AGCTGAAGGC TTTGCAGGCC AC ACAC AC T G TAGCCTGTCT 
TA 

TGCCATCTTT GTCTTGTTGC CATTGGGTTT. GTTTGGAGCC AAGGGGCTGA 700 
AGAAGGCATT GGGTATGGGG CCAGGCCCCT GGAT GAAT AT CCTGTGGGCC 
TGGTTTATTT TCTGGTGGCC TCATGGGGTG GTTCTAGGAC TGGATTTCCT 800 
GGTGAGGTCC AAGCTGTTGC TGTTGTCAAC ATGTCTGGCC CAGCAGGCTC 
TGGACCTGCT GCTGAACCTG GCAGAAGCCC TGGCAATTTT GCACTGTGTG 900 
GCTACGCCCC TGCTCCTCGC . CCTATTCTGC CACCAGGCCA CCCGCACCCT 
CTTGCCCTCT CTGCCCCTCC CTGAAGGATG GTCTTCTCAT CTGGACACCC 1000 

• T 

TTGGAAGCAA ATCCTAG 1017 
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ISOFORMS OF THE FY PROTEIN 

* 

MAS S G YVLQA ELSPSTENSS QLDFEDVWNS SYGVNDSFPD GDYDANLEAA 

G 

APCHSCNLLD DSALPFFILT SVLGILASST VLFMLFRPLF RWQLCPGWPV 100 

F C 
LAQLAVGSAL FSIWPVLAP GLGSTRSSAL CSLGYCVWYG SAFAQALLLG 
T 

CHASLGHRLG AGQVPGLTLG LTVGIWGVAA LLTLPVTLAS GASGGLCTLI ■ 200 

YSTELKALQA THTVACLAIF VLLPLGLFGA KGLKKALGMG PGPWMNILWA 
X 

WFIFWWPHGV VLGLDFLVRS KLLLLSTCLA QQALDLLLNL AEALAILHCV 300 
ATPLLLALFC HQATRTLLPS LPLPEGWSSH LDTLGSKS 338 

F 

X = M 'or I 
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FY_SEQ Listing TEMPLATE.ST25.txt 
SEQUENCE LISTING 

<110> Genaissance Pharmaceuticals, Inc. 
Chew, Anne 
Choi, Julie Y. 
Koshy, Beena 

<120> . HAPLOTYPES OF THE FY GENE 

<130> FY_MWH-14 4 0PCT 

<14 0> TBA 

<141> 2001-10-12 

<150> 60/240,275 
<151> 2000-10-13 

<160> 84 

<170> Patentln version 3.1 

<210> 1 

<211> 9025 

<212> DNA • 

<213> Homo sapiens 

<220> 

<221> allele 

<222> (2690) . - (2690) 

<223> PS1: polymorphic base C or T 



<220> 

<221> allele 

<222> (2864) . . (2864) 

<223> PS2: polymorphic base G or A 



<220> 

<221> allele 

<222> (2882) . . (2882) 

<223> PS3: polylmorphic base A or G 



<220> 

<221> allele 

<222> (2910) . . (2910) 

<223> PS4 : polymorphic base C or T 



<220> 

<221> allele. 

<222> (2949) . . (2949) 

<223> PS5 : polymorphic base C or A 



<220> . 

<221> allele 

<222> (2980) . . (2980) 

<223> PS 6 : polymorphic base .G or C 



<220> 

<221> allele 
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FY__SEQ Listing TEMPLATE.ST25.txt 

<222> (2996) . . (2996) 

<223> PS 7: polymorphic base C or T 



<220> 
<221> 
<222> 
<223> 



allele 

(3259) . . (3259) 

PS 8: polymorphic base T or C 



<220> 
<221> 
<222> 
<223> 



allele 

(3470) . . (3470) 

PS 9 ; polymorphic base T or C 



<220> 
<221> 
<222> 
<223> 



allele 

(3672) . . (3672) 

PS10: polymorphic base C or T 



<220> 
<221> 
<222> 
<223> 



allele . 
(3707) (3707) 

PSll: polymorphic base C or T 



<220> ' 

<221> allele 

<222> (3979) . . (3979) 

<223> PS12: polymorphic base A or G 



<220> 

<221> allele 

<222> (3997) . . (3997) 

<223> PS13: polymorphic base C or T 



<220> 

<221> allele 

<222> (4140) . . (4140) 

<223> PS14: polymorphic base A or G 



<220> 

<221> allele 

<222> (4214) . .(4214) 

<223> PS15: polymorphic base C or T 



<220> 

<221> allele 

<222> (4280) . . (4280) 

<223> PS16: polymorphic base C or T 



<220> 

<221> allele 

<222> (4313) . . (4313) 

<223> PS17: polymorphic base G or A 
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<220> 
<221> 
<222> 
<223> 



FY_SEQ Listing TEMPLATE.ST25.txt 



allele 

(4617) . . (4617) 

PS 18 : polymorphic base C or T 



<220> 
<221> 
<222> 
<223> 



allele 

(4618) . . (4618) 

PS19: polymorphic base G or A 



<220> 
<221> 
<222> 
<223> 



allele 

(4992) . . (4992) 

PS20: polymorphic base C or T 



<400> 1 

ggcaaaggtt 

aggcaggtgc 

tcctgaggag 

tggggaggac 

ctctcacccc 

tggagatagc 

ctggggcctg 

ctggattgtt 

cccggccacc 

gcagccaggc 

cctgcccaaa 

tgcatgtgtg 

aggaggaggg 

ggaggctttg 

catgctggta 

ggtgtgtgtt 

tccctggatt 

tcaggcctct 

tctgtaccca 

tctcgtggcc 

gactgccccg 

cctccctggt 

acccccttca 

caccacaggc 

ctggggcccc 

tttcctcttc 

ggtgggaggg 

aagggagcag 

cagaggggtc 

tcctcttgct 

tggtgcctgg 

tgtacagcaa 

ccttcctttc 

gcattgctgg 

tcaccagaca 

gagctagtgt 

taggtctgag 

caacactgga 

agccctttct 

ccactatgtc 

cctggggaaa 

gcttgggaga 

cccctcagtg 

actttatgtc 

taaaatatgc 



gggagtggct 
caccttgaca 
atgtggcctg 
gcgggtccct 
attccccccg 
agagtgtggc 
aactgcctct 
gcacggcaac 
ctgtgccccc 
acfacataaca 
ctaagggaat 
tgtgtgtgtg 
ctaagagcca 
ggaatggggt 
tttcctctgc 
tgcccagctg 
tccctcctct 
ccttggcgtt 
gtgagttgaa 
ccagctggtt 
cctcagatcc 
ctgacaatgg 
gttcaggtaa 
agcccctcac 
aggcctgttg 
aagctgaacc 
agaggggctg 
ggaggggacc 
accctcaata 
gctgctggga 
tgcctcaaaa 
caatataact 
cagagagtgt 
agaatgttaa 
ccaccagacc 
cttgggagaa 
gtgcttggc'c 
gatgaagcag 
atctcattgt 
tccctagctg 
cagctgagga 
atcggcctgc 
agaaacttta 
ccccagtaga 
tgtcccattg 



tttcctctgg 
cctctcttcc 
cagacaacat 
gcaaggaaga 
gaaatgaaca 
tgcccctcct 
gcttcccccc 
cactcctccc 
ttcctggtcc 
acaaaactac 
aatgtgaact 
tgcatgtgtg 
ggggtcctgg 
attcctagtc 
cacccccaat 
tccattctat 
gagccatgga 
gaccctggaa 
ggctgggtcc 
gctgtggaga 
aggatgccct 
gcatgcaaaa 
tacagttgtg 
ctaagccggg 
agcttcttgt 
actctgtcca 
ttgtgctgtg 
gtgtatcttt 
tatctggatt 
tgtgtgtatg 
ggtggtgtgt 
ctacaaggga 
cttggggggc 
gacggcgatg 
aaattcactt 
aagggggaaa 
ccctaggaag 
ccgatgctgt 
tctgacttgg 
aggaaataaa 
ggggagggag 
ctgcagggta 
catattgcta 
aatatgaata 
tcccctagag 



tagccacaca 
atagcaatgg 
gcagccatgc 
gtagattt'gg 
aagccgggcc 
tgctccagca 
ctgaggggcc 
atggcattgc 
caacgccagc 
taaaaggagc 
gtgtgcatgt 
tgagtgagtg 
gcaagtggac 
agggttcaca 
gcccatcggt 
ctctccctta 
gtcagtgcca 
agacctacca 
ccacccttcc 
tgaggttcct 
tggcatcgct 
aggggcagct 
aatcttccag 
gcctactcct 
ctcccagcac 
tat-tacacag 
tgtgtctgtc 
ataatctttc 
atccgtgtca 
tgagggtctt 
cccttgccag 
gagaagtgtg 
atctgactgc 
gagatgccat 
tccagcccct 
tctctacgag 
ccctacatga 
tttggacaaa 
acacgccatg 
agcagagagg 
ggggaagaag 
gatgcccttt 
agatgcctgg 
gaaatcaccc 
cctactttaa 
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cctgagcact 
gaaagtgatg 
agggacccag 
agaggaagga 
ctttccatag 
gcagtgggag 
cctcactctt 
tcagcaacta 
ccttcatcct 
ttcactgcag 
gtgtggtgtg 
agaggcagag 
agggctgtgg 
cctcacctgg 
cttggagaaa 
aacacagagc 
cagcctttgc 
ccacctattt 
ttttgatttc 
ggtcctccct 
cccaccctcc 
gcaatctagc 
ccgctggtta 
cttacaacag 
ccgcttttgg 
aagccatatt 
caggggtggg 
taactctcct 
ttcagctgcc 
cttcccatac 
gccactctca 
ttcacttcct 
ttccccccac 
caaccccacc 
tcatgttgaa 
gtacccatcc 
atgggacaga 
tgaaacagcg 
gctcaccgct 
ggtgatgaaa 
ccactaaaaa 
ctctctgctg 
ccaatgaaac 
tgtgggcaat 
cttgtcagay 



acggacaggg 60 

agtgcgggag 120 

gactgtaacc 180 

tggaggtgga 2 40 
gaactgccct • 300 

aggcactgct 360 

acccaagact 4 20 

cttctccctt 4 80 

tcctccctca 540 

tgagctgttt 600 

tatgcatgtg 660 

cgaggaactg 720 

gacatgttgg 7 80 

gatgttgttc 84 0 

ggagtccccg 900 

attcagccct 960 

tatgcacctc 1020 

tttcccatag 1080 

ctgtcttcct 1140 

gtcctggctg 1200 

cccagctttt 1260 

aggcctgccc 1320 

gggccttggg 1380 

caagagagcc 14 4 0 

gaaaatgact 1500 

tgtacggggg 1560 

ggggtggggg 1620 

gtgctaatct 1680 

tcctttctgg 1740 

cccttgcacc 1800 

agaatatcta 18 60 

tttgctaagc 1920 

cctctgccag 1980 

ctgcagagca 2040 

cctgaaactt 2100 

ttctgcacct 2160 

aggtccttaa 2220 

tcccctaacc 2280 

cccaaagtcc 234 0 

cagtgacgat 24 00 

agtgaaatgt 24 60 

gccagctctg 2520 

agttccagag 2580 

ggtcccattt 2640 

catgtattcc 2700 



WO 02/30950 



PCT/US01/42725 



FY_SEQ Listing TEMPLATE.ST25.txt 

acttcatatg caagaggcat gcactgagcc cataggtggc taggcaaaca cccaatagct 27 60 

ccctgaaatg gcttcattat gcagcctcga cagccacccc aaccctccca ctctcacact 2820 

gaaacaccca gacctagaga tagctagaca cacccagaca cccrccaagc ccctcacata 2880 

crgatatgtg cacaatgata cacagcaaay gtacacagag ttcagtacac acaaagagct 2940 

cacgcccamg tgcacacacc cctcagttgg gacagagtts accaccacca cctttytccc 3000 

aaacacatgg cttttgaact gcctttcctt ggatccagtt caaggggatg gaggagcagt 3060 

gagagtcagc cgcccttcca ctccaatttc ccagcacctc ccttatctct gcctcacaag 3120 

tcacccagcc cccctctctt ccttccttgt gcttgaagaa tctctccttg ctggaaagcc 3180 

ccctgttttc tcaatctccc tttccacttc ggtaaaatct ctacttgctg gaaagccccc 3240 

tgttttctca atctccctyt ccacttcggt aaaatgccca ctttctggtc cccacctttt 3300 

tcctgagtgt agtcccaacc agccaaatcc aacctcaaaa caggaagacc caaggccagt 3360 

gacccccata ggcctgaggc ttgtgcaggc agtgggcgtg gggtaaggct tcctgatgcc 34 20 

ccctgtccct gcccagaacc tgatggccct cattagtcct tggctcttay cttggaagca 3480 

caggcgctga cagccgtccc agcccttctg tctgcgggcc tgaaccaaac ggtgccatgg 3540 

ggaactgtct gcacagggtg agtatggggc caggccccag agtcccttat ccctatgccc 3600 

ctcatttccc ctgctgtttg cccctcagtc tttatatctc ttccttttcc tcctcatctt 3660 

ttctcccttc cygctttttt cctcttcctt caaagtcttt ttccttytct ccttcctatg 3720 

ctagcctcct agctccctct tgtgtccctc cctttgcctt tgagtcagtt ccatcctggt 3780 

ctcttggtgc ctttccttct gaccttgcac tgctcctcca gccccagctg ccctggcttc 3840 

cccaggactg ttcctgctcc ggctcttcag gctccctgct ttgtcctttt ccactgtccg 3900 

cactgcatct gactcctgca gagaccttgt tctcccaccc gaccttcctc tctgtcctcc '. 3960 

cctcccacct gcccctcart tcccaggaga ctcttcyggt gtaactctga tggcctcctc 4 020 

tgggtatgtc ctccaggcgg agctctcccc ctcaactgag aactcaagtc agctggactt 4 080 

cgaagatgta tggaattctt cctatggtgt gaatgattcc ttcccagatg gagactatgr 414 0 

tgccaacctg gaagcagctg ccccctgcca ctcctgtaac ctgctggatg actctgcact 4200 

g.cccttcttc atcytcacca gtgtcctggg tatcctagct agcagcactg tcctcttcat 4260 

gcttttcaga cctctcttcy gctggcagct ctgccctggc tggcctgtcc tgrcacagct 4320 

ggctgtgggc agtgccctct tcagcattgt ggtgcccgtc ttggccccag ggctaggtag 4380 

cactcgcagc tctgccctgt gtagcctggg ctactgtgtc tggtatggct cagcctttgc 4 44 0 

ccaggctttg ctgctagggt gccatgcctc cctgggccac agactgggtg caggccaggt 4500 

cccaggcctc accctggggc tcactgtggg aatttgggga gtggctgccc tactgacact 4560 

gcctgtcacc ctggccagtg gtgcttctgg tggactctgc accctgatat acagcayrga 4 620 

gctgaaggct ttgcaggcca cacacactgt agcctgtctt gccatctttg tcttgttgcc 4 680 

attgggtttg tttggagcca aggggctgaa gaaggcattg ggtatggggc caggcccctg 4740 

gatgaatatc ctgtgggcct ggtttatttt ctggtggcct catggggtgg ttctaggact 4 800 

ggatttcctg gtgaggtcca agctgttgct gttgtcaaca tgtctggccc agcaggctct 4860 

ggacctgctg ctgaacctgg cagaagccct ggcaattttg cactgtgtgg ctacgcccct 4 920 

gctcctcgcc ctattctgcc accaggccac ccgcaccctc ttgccctctc tgcccctccc 4 980 

tgaaggatgg tyttctcatc tggacaccct tggaagcaaa tcctagttct cttcccacct . 5040 

gtcaacctga attaaagtct acactgcctt tgtgaagcgg gtggtttctt attttgtctg 5100 

gggagaagaa ggagaatgga gagagagaca tttttatgtc agactttctt gccagtgtct 5160 

gcttctatag ctggcttggg aagaaggtga atgatgaata aataccctca gggtacacag 5220 

atgttctctt gaggtgtggg gtcacggcca tctcaaggga gaagagaaga ggaaccagag 5280 

catgagggga gtcattaaac caaaaaaaac agaagggatg gcttagctgg aaaaaaagct 534 0 

gttctgggaa gcaaatggaa taggaactca aactgagaga taaacagtga agagtgatga 5400 

caaagcccag agcaatacca cctccccctg tccaacctgc ccagcctctg tcttctgtct 54 60 

cctctctggc tttgtttagt gattaggaca gtggtgggga aggtgaaaga agcatcccag 5520 

gggatgttac tcagttcagg gaacatatca aggtaattta aaaagccact tcctgggagt. 5580 

catctctccc aggttcctca gcatgacctg aatgtgcgtg cgtgtgtgtg tgtgtgtgtg 5640 

tgtacacatc tgtttctcga tctgttagaa tctaccttta tgttagatgt atgcatgtaa 5700 

aaacatatgt ccacccatga gcttgcatct ctgtcagcac ctgaactgcg cacacctgtg 57 60 

cgtgtgcact gacttttctc aggacccaaa cccccactca attctgcact catccctgtt '5820 

cacaggatat agaatcggga tttatgactc actccttacc caaatgagtt ttctttaccc 5880 

tggtttttaa gcctagtctt ttctgtgtag gatgtgtgga gggaagaaaa gatcaagaag 5940 

ttgtgaaggg tggagaaact tgaaggggga ggccctgatt tgattcatct tctgcttgga 6000 

attccccgaa tttccctttc agaatctcag cttttgaaat aaacctttat ttcccacata 6060 

catctttcct tccaccttcc acacaatacc ccaatcccct gggcaccttt ttcccaaccc 6120 

ctgattctct ggctgcttaa tcatgacctt tgagattttt ctcagtctct acctacccaa 6180 

gtttagatgg ctggaaggac agaaacccct cctcatcagg ggcacagctt ttaccaccaa 6240 

gagcaaattc accctctacc caagaggcta caaaacagtt agttcctacc tctaacccaa 6300 

ctaaaggctg gggaaacttg agcagatacg ttctatcagt ttgaacccaa ttaccatctt 6360 

accattttcc aaagatatgc tatacctggt ttctttacta aaatgtttct gcttgactct 6420 

ctgggcttgg gaatagtagg cgagtgcggg agaggtgcag agatgagtta gaatagctta .6480 

ggcaggaggg tgcaaaaggc ttagggaatt ttcctgggtg ggtgccacga caaggcctct 6540 
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aaatctccca cctcctgtct cttagcaacc accaggttag ctcctgattg gttcgtcctc 6600 
aattgaaagg cgggatttag ggaccgattg agacgcggga gacattctga aacagaaagg 6660 
aagggagaga aaatgaagag aaaggaaata atttacaaac ctaaattatg ctctggtttc 6720 

L *~ ~~ 6780 

6840 
6900 



caaccacagt tcatgaatgt gttctagtat tttttccccc gctttttttt ttccaggctt 
ctctcaatat ccccctcccg tccttgacca ctcttgcaat tctaccagat gttgctgtcc 
tcccttacaa ggtactgatt tggaagctga cctagttgag ggggaggaga gggcgttttt 
qactccctga atcttccagt gtcaacctga tgcaagggag gcttaattta agaccagtag 6960 

* - ■ ' 1 1 - - 7020 

7080 



gcttgtctta tctgccccca accctgtgcc tctggataga aatccctggt cagtcagtcc 
agttagagag aaccccagac tcctgggtaa tagcttggca gctctcatgg ctttcacaag 

ggaaaggcag ctgcagaagc ccgaagctgc taagaggtta gggtgggctg gagacagtgc 7140 

cctacccccg ccccctgcta catcctcctc atccccaccc ccaccgggat tgctccaggc 7 200 

cttttgggct gccctttccc tgccattacc taggcagcac ttggagagct cctccttaag 7260 
tctaacccgg acctcagtca tttctttaaa gctttcttgg ggacctgcca ccccatgcat _ 7 320 

ttaacccact gcatgccatc aaccactcta aaattggtct gagtctggca tcttttctgc 7380 

aacccttcag gaatacaaat cctgtctcct taaagccctt aagaatttaa tcttagggtt 7440 

ggcagggact ttagctgtgt atgagatatt gggcatccta gctaaagaaa aaaatcctct 7500 

cagaaagatg agagccaggg aagcaagctc ttgggaaaac acaggaccct gaggaaggtc 7560 

agtttgcttt gctttctaaa ggagagagat ctattattca agggaagttt gaacatcaca 7620 
ttgacgctca tagttcattt attccaagct gaggcccctc ccttaggatt tagaaaacaa 
atacttggtc ctcacaccct ttttccattc ctatttccct atcccccaac cccatcacca 
ccttcctccc tcagaggaat tctgattgag aacttcactg ggatttcaaa cccaattcat 
cgccaactct aattgccaga gatttgcatg aaaaccatcg tatgctatct" aattattctg 

acaacagcag cccgccgtct gggcacaagg agaatcggag ttttaattaa caataatgca 7920 

ccttgctgac gaatgcgact gtttaggtta attaacaagt ccaagtcctt ccaaatcatc 7980 

tctagacatc taggtgattt gggcaggaag ggtgtgggga acacagggag ggatggggag 8040 

tgtttaagca tcatttctgc aaaaatgcac gttagctttc ttctttcctg taactatttg 8100 

gtgaagggaa gagaaactct ctaagagact ggctctggaa aattggttgg gggattttga 8160 
gaacatcttc tttttttttt tttttttttg agacagagtc tcactctgtt gcccaggctg 
gagtgtagtg gtgcaatctt ggctcactgc aacctccgcc tcccaggttc aagtgattct 

cctgcctcag cctcctgagt agctgggatt acaggtgtgc accaccacgc caggctaatt 834 0 

ttttgtattt ttagtagaga cgggggggtc tcaccagttt ggccagcctg gtctcgaact 8400 

ctgacttcag gtgatccacc tgcctcagcc tcccaaagtg ctgggatcac aggcgtgagc 8460 

caccgcgccc ggcgggaaca tcattttaag gggatgtatc agacatcttt atgttgcact 8520 
tagatttagg aaatcttttg gatacatttt tataaatgag aagattaagt tcttatagct 
ctctagtatc tcaaaatcat tgcctgattg tttgcaaact tggtttctag catgaaagtc 
tcaacttccc catcaatgcc atttgtcctc agctttctct atatgttcct accacatctg 

tggtcattta aagttgccta ctgcttgtga acccgggagg tggagcttgc agtaagccga 8760 

gatcgcgcca ctgcactcca gcctgagcga cagagtgaga ctccatctca aaaaaaaaaa 8820 

aaaaaaaagt tgcctactgc ctttggtttc ccagataacg tgtcaagttt cacccttgcc 8880 

ctcttcaaag ataactgtat ttttttttcc tgggtagttc tccgtatcat gcaaaaatac 8940 

attgtatgta gctccaaact gtacctttca tctttctagt ctttctaaga gcatggacct 9000 

agtctttttc ctctaaatag ggtat 9025 



7680 
7740 
7800 
7860 
7920 
7980 
8040 
8100 
8160 
8220 
8280 



8580 
8640 
8700 



<210> 2 

<211> 1017 

<212> DNA 

<213> Homo sapiens 

<400> 2 

atggcctcct ctgggtatgt cctccaggcg gagctctccc cctcaactga gaactcaagt 



60 



180 
240 
300 



cagctggact tcgaagatgt atggaattct tcctatggtg tgaatgattc cttcccagat 120 

ggagactatg atgccaacct ggaagcagct gccccctgcc actcctgtaa cctgctggat 

gactctgcac tgcccttctt catcctcacc agtgtcctgg gtatcctagc tagcagcact 

gtcctcttca tgcttttcag acctctcttc cgctggcagc tctgccctgg ctggcctgtc 

ctggcacagc tggctgtggg cagtgccctc ttcagcattg tggtgcccgt cttggcccca 360 

gggctaggta gcactcgcag ctctgccctg tgtagcctgg gctactgtgt ctggtatggc 420 

tcagcctttg cccaggcttt gctgctaggg tgccatgcct ccctgggcca cagactgggt 4 80 

gcaggccagg tcccaggcct caccctgggg ctcactgtgg gaatttgggg agtggctgcc 540 

ctactgacac tgcctgtcac cctggccagt ggtgcttctg gtggactctg caccctgata 600 

tacagcacgg agctgaaggc tttgcaggcc acacacactg tagcctgtct tgccatcttt 660 

gtcttgttgc cattgggttt gtttggagcc aaggggctga agaaggcatt gggtatgggg 720 

ccaggcccct ggatgaatat cctgtgggcc tggtttattt tctggtggcc tcatggggtg 780 

gttctaggac tggatttcct ggtgaggtcc aagctgttgc tgttgtcaac atgtctggcc 84 0 
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cagcaggctc tggacctgct gctgaacctg gcagaagccc tggcaatttt gcactgtgtg 900 

gctacgcccc tgctcctcgc cctattctgc caccaggcca cccgcaccct cttgccctct 960 

ctgcccctcc ctgaaggatg gtcttctcat. ctggacaccc ttggaagcaa atcctag . 1017 

<210> 3 

<211> 338 

<212> PRT 

<213> Homo sapiens 

<400> 3 

Met Ala Ser Ser Gly Tyr Val Leu Gin Ala Glu Leu Ser Pro Ser Thr 
1 5 10 , 15 

Glu Asn Ser Ser Gin Leu Asp Phe Glu Asp Val Trp Asn Ser Ser Tyr 

20 25 30 

Gly Val Asn Asp Ser Phe Pro Asp Gly Asp Tyr Asp Ala Asn Leu Glu 
35 40 45 

Ala Ala Ala Pro Cys His Ser Cys Asn Leu Leu Asp Asp Ser Ala Leu 
50 55 60 

Pro Phe Phe lie Leu Thr Ser Val Leu Gly lie . Leu Ala Ser Ser Thr 
65 . 70 75 80 

Val Leu Phe Met Leu Phe Arg Pro Leu Phe Arg Trp Gin Leu Cys Pro 

•85 90 '95 

Gly Trp Pro Val Leu Ala Gin Leu Ala Val Gly Ser Ala Leu Phe Ser 

100 105 110 

lie Val Val Pro Val Leu Ala Pro Gly Leu Gly Ser Thr Arg Ser Ser 
115 120 125 

Ala Leu Cys Ser Leu Gly Tyr Cys Val Trp Tyr Gly Ser Ala Phe Ala 
130 135 140 

Gin Ala Leu Leu Leu Gly Cys His Ala Ser Leu Gly His Arg Leu Gly 
145 150 155 .160 

Ala Gly Gin Val Pro Gly Leu Thr Leu Gly Leu Thr Val Gly lie Trp 

. 165 170 175 

Gly Val Ala Ala Leu Leu Thr Leu Pro Val Thr Leu Ala Ser Gly Ala 

180 185 190 

Ser Gly Gly Leu Cys Thr Leu lie Tyr Ser Thr Glu Leu Lys Ala Leu 
195 . 200 ~ 205 

Gin Ala Thr His Thr Val Ala Cys Leu Ala lie Phe Val Leu Leu Pro 
210 215 220 

Leu Gly Leu Phe Gly Ala Lys Gly Leu Lys Lys Ala Leu Gly Met Gly 
225 230 235 240 

Pro Gly Pro Trp Met Asn He Leu Trp Ala Trp Phe lie Phe Trp Trp 

245 250 255 

Pro His Gly Val Val Leu Gly Leu Asp Phe Leu Val Arg Ser Lys' Leu 

260 265 270 

Leu Leu Leu Ser Thr Cys Leu Ala Gin Gin Ala Leu Asp Leu Leu Leu 
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275 280 285 

Asn Leu Ala Glu Ala Leu Ala He Leu His Cys Val Ala Thr Pro Leu 
290 295 300 

Leu Leu Ala Leu Phe Cys His Gin Ala Thr Arg Thr Leu Leu Pro Ser 
305 310 315 320 

Leu Pro Leu Pro Glu Gly Trp Ser Ser His Leu Asp Thr Leu Gly Ser 

325 330 335 

Lys Ser 



<210> 4 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 4 

tgtcagayca tgtat 15 

<210> 5 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 5 

gacacccrcc aagcc 15 

<210> 6 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 6 

cacatacrga tatgt 15 

<210> 7 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 7 • 

cagcaaaygt acaca 15 



<210> 8 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 8 

acgcccamgt gcaca 15 

<210> 9 

<211> 15 

<212> DNA 

<213> Homo sapiens 
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<400> 9 



cagagttsac caeca 



15 



<210> 10 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 10 

cacctttytc ccaaa 15 

<210> 11 

<211> 15 

<212> DNA 

<213> Homo sapiens 



<210> 12 

<211> 15 

<212> DNA' 

<213> Homo sapiens 

<400> 12 

cccttccygc ttttt 15 



<210> 13 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 13 

tttccttytc tcctt 15 



<210> 14 

<211> 15 

<212> DNA 

<213> Homo sapiens 



<210> 15 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 15 

actcttcygg tgtaa 15 



<210> 16 

<211> 15 

<212> DNA 

<213> Homo sapiens 



<400> 11 
tctccctytc cactt 



15 



<400> 14 



cccctcartt cccag 



15 
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<400> 16 
cttcatcytc accag 



15 



<210> 
<211> 
<212> 
<213> 



17 
15 
DNA 



Homo sapiens 



<400> 17 
tacagcaygg agctg 



15 



<210> 18 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 18 

acagcacrga gctga 15 



<210> 19 

<211> 15 

<212> DNA 

<213> Homo sapiens 



<210> . 20 
<211> 15 
<212> DNA 
. <213> Homo Sapiens 

<400> 20 

ttaacttgtc agayc 15 



<210> 21 

<211> 15 

<212> DNA 

<213> Homo sapiens 



<210> 22 

<211> 15 

<212> DNA 

<213> Homo Sapiens 

<400> 22 

cacccagaca cccrc 15 

<210> 23 

<211> 15 

<212> DNA 

<213> Homo sapiens 



<400> 19 
ggatggtytt ctcat 



15 



<400> 21 



agtggaatac atgrt 



15 



<400> 23 
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gtgaggggct tggyg 



15 



<210> 
<211> 
<212> 
<213> 



24 
15 
DNA 



Homo Sapiens 



<400> 24 



gcccctcaca tacrg 



15 



<210> 25 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 25 

ttgtgcacat atcyg 15 



<210> 26 

<211> 15 

<212> DNA 

<213> Homo Sapiens 



<210> 27 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 27 

gaactctgtg tacrt .15 



<210> 28 

<211> 15 

<212> DNA 

<213> Homo Sapiens 



<210> 29 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 29 

ggggtgtgtg cackt 15 



<210> 30 

<211> 15 

<212> DNA 

<213> Homo Sapiens 



<400> 26 
gatacacagc aaayg 



15 



<400> 28 
gagctcacgc ccamg 



15 



<400> 30 
ttgggacaga gttsa 
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<210> 
<211> 
<212> 
<213> 



31 
15 
DNA 
Homo 



sapiens 



<400> 31 
aggtggtggt ggtsa 



15 



<210> 32 

<211> 15 • 

<212> DNA 

<213> Homo Sapiens 

<400> 32 

caccaccacc tttyt 15 

<210> 33 

<211> 15 

<212> DNA 

<213> Homo sapiens' 

<400> 33 

catgtgtttg ggara ■ .15 

<210> 34 

<211> 15 

<212> DNA 

<213> Homo Sapiens 

<400> 34 

tctcaatctc cctyt 15 

<210> 35 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 35 

ttaccgaagt ggara 15 



<210> 36 

<211> 15 

<212> DNA 

<213> Homo Sapiens 

<400> 36 

ttttctccct tccyg 15 



<210> 37 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 37 
agaggaaaaa agcrg 
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<210> 
<211> 
<212> 
<213> 



38 
15 
DNA 



Homo Sapiens 



<400> 38 
agtctttttc cttyt 



15 



<210> 39 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 39 

cataggaagg agara 15 



<2l0> 40 

<:211> - 15 

<212> DNA 

<213> Homo Sapiens 



<210> 41 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 41 

agtctcctgg gaayt 15 



<210> 42 

<211> 15 

<212> DNA 

<213> Homo Sapiens 



<210> 43 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 43 

tcagagttac accrg 15 



<210> 44 

<211> 15 

<212> DNA 

<213> Homo Sapiens 



<400> 40 
cacctgcccc tcart 



15 



<400> 42 
caggagactc ttcyg 



15 



<400> 44 
gcccttcttc atcyt 



15 
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<210> 45 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 45 

aggacactgg tgarg 15 

<210> 46 

<211> 15 

<212> DNA 

<213> Homo Sapiens 

<400> 46 

ctgatataca gcayg 15 

<210> 47 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 47 . 

agccttcagc tccrt 15 

<210> 48 

<211> 15 

<212> DNA 

<213> • Homo Sapiens 

<400> 48 

tgatatacag cacrg 15 

<210> 49 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 49 • 
aagccttcag ctcyg 15 

<210> 50 

<211> 15 

<212> DNA 

<213> Homo Sapiens 

<400> 50 

cctgaaggat ggtyt 15 



<210> 51 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 51 
gtccagatga gaara 



<210> 52 
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<211> 10 

<212> DNA 

<213> Homo sapiens 

<400> 52 

acttgtcaga 10 



<210> 53 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<400> 53 
ggaatacatg 



<210> 54 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<400> 54 
ccagacaccc 



<210> 55 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<400> 55 
aggggcttgg 



<210> 56 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<400> 56 
cctcacatac 



<210> 57 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<400> 57 
tgcacatatc 



<210> 58 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<400> 58 
acacagcaaa 



10 



10 



10 



10 



10 



10 



<210> 59 
<211> 10 
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<212> 
<213> 



DNA 



Homo sapiens 



<400> 59 
ctctgtgtac 



10 



<210> '60 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<400> 60 

ctcacgccca 10 



<210> 61 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<4 00> 61 

gtgtgtgcac 10 



<210> 62 

<211> 10 

<212> DNA 

<2 1 3 > H omo s api ens 



<210> 63 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<400> 63 

tggtggtggt 10 



<210> 64 

<211> 10 

<212> DNA 

<213> Homo sapiens 



<210> 65 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<400> 65 

gtgtttggga 10 



<400> 62 
ggacagagtt 



10 



<400> 64 
caccaccttt 



10 



<210> 66 
<211> 10 
<212> DNA 
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<213> Homo sapiens 



<400> 66 
caatctccct 



10 



<210> 67 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<400> 67 

ccgaagtgga 10 

<210> 68 

<211> 10 

<212> DNA 

. <213> Homo sapiens 



<210> 69 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<400> 69 

ggaaaaaagc 10 

<210> 70 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<400> 70 

ctttttcctt 10 



<210> 71 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<400> 71 

aggaaggaga .10 

<210> 72 

<211> 10 

<212> DNA 

<213> Homo sapiens 



<400> 68 
tctcccttcc 



10 



<400> 72 
ctgcccctca 



10 



<210> 
<211> 
<212> 
<213> 



73 
10 
DNA 



Homo sapiens 
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<400> 73 
ctcctgggaa 



<210> 74 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<400> 74 
gagactcttc 



<210> 75 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<400> 75 
gagttacacc 



<210> 76 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<400> 76 
cttcttcatc 



<210> 77 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<4 00> 77 
acactggtga 



<210> 78 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<400> 78 • 
atatacagca 



<210> 79 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<400> 79 
cttcagctcc 



<210> 80 

<211> 10. 

<212> DNA 

<213> Homo. sapiens 



10 



10 



10 



10 



10 



10 



10 
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<400> 80 
tatacagcac 



10 



<210> 81 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<400> 81 

ccttcagctc 10 



<210> 82 

<211> 10 

<212> DNA 

<213> Homo sapiens 



<210> 83 

<211> 10 

<212> DNA. 

<213> Homo sapiens 

<400> 83 

cagatgagaa 10 



<210> 84 

<211> 2400 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> allele 

<222> (30) . . (30) 

<223> PS1: polymorphic base C or T 



<222> (61) . . (120) 

<223> Ns represent sequence between PS 



<220> 

<221> allele 

<222> (150) . . (150) 

<223> PS2: polymorphic base G or A 



<220> 

<221> misc_feature 

<222> (181) . . (240) . 

<223> Ns represent sequence between PS 



<220> 

<221> allele 

<222> (270) . . (270) 

<223> PS3: polymorphic base A or G 



<400> 82 
gaaggatggt 



10 



<220> 
<221> 



misc feature 
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<220> 
<221> 
<222> 
<223> 



mis cofeature 
(301) . . (360) 

Ns represent sequence between PS 



<220> 

<221> allele 
<222> : (390) . i (390) 

<223> PS4: polymorphic base C or T 



<220> 

<221> misc_feature 

<222> (421) . . (480) 

<223> Ns represent sequence between PS 
<220> 

<221> allele 

<222> (510) . . (510) 

<223> PS5: polymorphic base C or A 



<220> 
<221> 
<222> 
<223> 



misc^feature 
(541) . . (600) 

Ns represent sequence between PS 



<220> 
<221> 
<222> 
<223> 



allele 
(630) . . (630) 
PS6 : polymorphic 



base G or C 



<220> 

<221> misc_feature 

<222> (661) . . (720) 

<223> Ns represent sequence between PS 



<220> 

<221> allele 

<222> (750) . . (750) 

<223> PS7: polymorphic base C or T 



<220> 

<221> misc_feature ' 

<222> (781) . . (840) 

<22 3> Ns represent sequence between PS 



<220> 

<221> allele 

<222> (870) . . (870) 

<223> PS8: polymorphic base T or C 



<220> 

<221> misc_feature 
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<222> (901) . . (960) 

<223> Ns represent sequence between PS 



<220> 

<221> allele 

<222> (990) . . (990) 

<223> PS9: polymorphic base T or C 



<220> 

<221> misc_f eature 

<222> (1021) (1080) 

<223> Ns represent sequence between PS 



<220> 

<221> allele 

<222> (1110) . . (1110) 

<223> PS10: polymorphic base C or T 



<220> 

<221> raisc__f eature 

<222> (1141) . . (1200) 

<223> Ns represent sequence between PS 



<220> 

<221> allele 

<222> (1230) . . (1230) 

<223> PS11: polymorphic base C or T 



<220> 

<221> misc__feature 

<222> (1261).. . (1320) 

<223> Ns represent sequence between PS 



<220> 

<221> allele 

<222> (1350) . . (1350) 

<223> PS12: polymorphic base A or G 



<220> 
<221> 
<222> 
<223> 



mis c_f eature 
(1381) . . (1440) 

Ns represent sequence between PS 



<220> 
<221> 
<222> 
<223> 



allele 

(1470) (1470) 

PS13: polymorphic base C 



or T 



<220> 

<221> misc^feature 

<222> (1501) . . (1560) 

<223> Ns represent sequence between PS 
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<220> 

<221> allele 

<222> (1590) (1590) 

<223> PS14: polymorphic base A or G 



<220> 

<221> misc_feature 

<222> (1621) . . (1680) 

<223> Ns represent sequence between PS 



<220> 

<221> allele 

<222> (1710) (1710) 

<223> PS15: polymorphic base C or T 



<220> 

<221> misc_feature 

<222> (1741) (1800) 

<223> Ns represent sequence between PS 



• <220> 

<221> allele 

<222> (1830) . . (1830) 

<223> PS16: polymorphic base C or T 



<220> 

<221> misc_feature 

<222> (1861) . . (1920) 

<223> Ns represent sequence between PS 
<220> 

<221> allele 

. <222> (1950) . ..(1950) 

<223> PS17: polymorphic base G or A 



<220> 

<221> misc_feature 

<222> (1981) . . (2040) 

<223> Ns represent sequence between PS 



<220> 

<221> allele 

<222> (2070) . . (2070) 

<223> PS18: polymorphic base G or A 



<220> 

<221> misc_f eature 

<222> (2101) . . (2160) 

<223> Ns represent sequence between PS 



<220> 

<221> allele 

<222> (2190) . . (2190) 

<223> PS19: polymorphic base G or A 
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<220> 

<221> misc__f eature 
<222> (2221) . . (2280) 

<223> Ns represent sequence between PS 
<220> 

<221> allele 

<.222> (2310) ... (2310) 

<223> PS20: polymorphic base C or T 
<220> 

<221> misc_f eature 
<222> (2341) . . (2400) 

<223> Ns represent sequence 3 f to PS20 
<400> 84 

tcccctagag cctactttaa cttgtcagay catgtattcc acttcatatg caagaggcat 60 

nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 120 

tagagatagc tagacacacc cagacacccr ccaagcccct cacatacaga tatgtgcaca 180 

nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 24 0 

cccagacacc cgccaagccc ctcacatacr gatatgtgca caatgataca cagcaaacgt 300 

nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 360 

cagatatgtg cacaatgata cacagcaaay gtacacagag ttcagtacac acaaagagct 420 

nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 4 80 

gttcagtaca cacaaagagc tcacgcccam gtgcacacac ccctcagttg ggacagagtt 540 

nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 600 

tgcacacacc cctcagttgg gacagagtts accaccacca cctttctccc aaacacatgg 660 

nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 720 

ttgggacaga gttgaccacc accaccttty tcccaaacac atggcttttg aactgccttt 780 

nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 84 0 

ggaaagcccc ctgttttctc aatctcccty tccacttcgg taaaatgccc actttctggt 900 

nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 960 

tgatggc.cct cattagtcct tggctcttay cttggaagca caggcgctga cagccgtccc 1020 

nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 1080. 

ccttttcctc ctcatctttt ctcccttccy gcttttttcc tcttccttca aagtcttttt 1140 

nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 1200 

tttcctcttc cttcaaagtc tttttcctty tctccttcct atgctagcct ' cctagctccc 1260 

nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 1320 

ctctgtcctc ccctcccacc tgcccctcar ttcccaggag actcttccgg tgtaactctg 1380 

nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 14 4 0 

cctgcccctc aattcccagg agactcttcy ggtgtaactc tgatggcctc ctctgggtat 1500 

nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 1560 

gaatgattcc ttcccagatg gagactatgr tgccaacctg gaagcagctg ccccctgcca 1620 

nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 1680 

tggatgactc- tgcactgccc ttcttcatcy tcaccagtgt cctgggtatc ctagctagca 17 40 

nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 1800 

tcctcttcat gcttttcaga cctctcttcy gctggcagct ctgccctggc tggcctgtcc 18 60 

nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 1920. 

ggcagctctg ccctggctgg cctgtcctgr cacagctggc tgtgggcagt gccctcttca 1980 

nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 204 0 

tggtggactc tgcaccctga tatacagcay ggagctgaag gctttgcagg ccacacacac 2100 

nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 2160 

ggtggactct gcaccctgat atacagcacr gagctgaagg ctttgcaggc cacacacact 2220 

nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 2280 

gccctctctg cccctccctg aaggatggty ttctcatctg gacacccttg gaagcaaatc 2340 

nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 24 00 
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